Novel streptococcus pneumoniae open reading frames encoding polypeptide antigens and uses thereof

ABSTRACT

The present invention relates to newly identified open reading frames comprised within the genomic nucleotide sequence of  Streptococcus pneumoniae , wherein the open reading frames encode polypeptides that are surface localized on  Streptococcus pneumoniae . Thus, the invention relates to  Streptococcus pneumoniae  open reading frames that encode polypeptides encoded by the  Streptococcus pneumoniae  open reading frames, vectors comprising open reading frame sequences and cells or animals transformed with these vectors. The invention relates also to methods of detecting these nucleic acids or polypeptides and kits for diagnosing  Streptococcus pneumoniae  infection. The invention finally relates to pharmaceutical compositions, in particular immunogenic compositions, for the prevention and/or treatment of bacterial infection, in particular infections with  Streptococcus pneumoniae.

This application is a continuation of co-pending U.S. application Ser.No. 12,778,530, filed May 12, 2010, which is a continuation of U.S.application Ser. No. 12/157,145, filed Jun. 6, 2008, now abandoned,which is a division of U.S. application Ser. No. 10/474,776, filed Jan.5, 2004, issued as U.S. Pat. No. 7,384,775, which is the National Stageof International Application No. PCT/US02/11524, filed Apr. 12, 2002,which claims the benefit of U.S. Provisional Application Nos.60/283,948, filed Apr. 16, 2001 and 60/284,443, filed Apr. 18, 2001. Thedisclosures of all the aforementioned priority applications areincorporated by reference in their entirety herein.

REFERENCE TO SEQUENCE LISTING

This application is being filed electronically via EFS-Web and includesan electronically submitted sequence listing in .txt format. The .txtfile contains a sequence listing entitled “PC63509D_SequenceListing.txt”created on Jul. 9, 2012 and having a size of 1.54 MB. The sequencelisting contained in this .txt file is part of the specification andherein incorporated by reference in its entirety.

FIELD OF THE INVENTION

The invention relates to Streptococcus pneumoniae genomic sequence andpolynucleotide sequences encoding polypeptides of Streptococcuspneumoniae. More particularly, the invention relates to newly identifiedpolynucleotide open reading frames comprised within the genomicnucleotide sequence of Streptococcus pneumoniae, wherein the openreading frames encode Streptococcus pneumoniae polypeptides, preferablypolypeptides that are surface localized, secreted, membrane associatedor exposed on Streptococcus pneumoniae.

BACKGROUND OF INVENTION

Streptococcus pneumoniae infections are a major cause of human diseasessuch as otitis media, bacteremia, meningitis, septic arthritis and fatalpneumonia worldwide (Butler et al., 1999; James and Thomas, 2000). Overthe past 10-20 years, Streptococcus pneumoniae has developed resistanceto most antibiotics used for its treatment. In fact, it is common forStreptococcus pneumoniae to become resistant to more than one class ofantibiotic, e.g., β-lactams, macrolides, lincosamides,trimethoprim-sulfamethoxazole, tetracyclines (Tauber, 2000), meaningStreptococcus pneumoniae treatment is becoming more difficult.

Thus, the rapid emergence of multi-drug resistant pneumococcal strainsthroughout the world has led to increased emphasis on prevention ofpneumococcal infections by immunization (Goldstein and Garau, 1997). Thecurrently available 23-valent pneumococcal capsular polysaccharidevaccine, is not effective in children of less than 2 years of age or inimmunocompromised patients, two of the major populations at risk frompneumococcal infection (Douglas et al., 1983). A 7-valent pneumococcalpolysaccharide-protein conjugate vaccine, recently licensed in theUnited States, was shown to be highly effective in infants and childrenagainst systemic pneumococcal disease caused by the vaccine serotypesand against cross-reactive capsular serotypes (Shinefield and Black,2000). The seven capsular types cover greater than 80% of the invasivedisease isolates in children in the United States, but only 57-60% ofdisease isolates in other areas of the world (Hausdorff et al., 2000).There is therefore an immediate need for a cost-effective vaccine tocover most or all of the disease causing serotypes of pneumococci. Whilethis can be achieved by adding conjugates covering additional serotypes,efforts continue to find non-capsular vaccine antigens that areconserved among all pneumococcal serotypes and effective againstpneumococcal disease.

Protein antigens of Streptococcus pneumoniae have been evaluated forprotective efficacy in animal models of pneumococcal infection. Some ofthe most commonly studied candidate antigens include the PspA proteins,PsaA lipoprotein, and the CbpA protein. Numerous studies have shown thatPspA protein is a virulence factor (Crain et al., 1990; McDaniel et al.,1984) but it is antigenically variable among pneumococcal strains. Arecent study has indicated that some antigenically conserved regions ofa recombinant PspA variant may elicit cross-reactive antibodies in humanadults (Nabors et al., 2000). PsaA, a 37 kD lipoprotein with similarityto other gram-positive adhesins, is involved in Mn⁺ transport inpneumococci (Sampson et al., 1994; Dintilhac et al., 1997) and has alsobeen shown to be protective in mouse models of systemic disease(Talkington et al., 1996). The surface exposed choline binding proteinCbpA is antigenically conserved and protective in mouse models ofpneumococcal disease (Rosenow et al., 1997). Since nasopharyngealcolonization is a prerequisite for otic disease, intranasal immunizationof mice with pneumococcal proteins and appropriate mucosal adjuvants hasbeen used to enhance the mucosal antibody response and thus, theeffectiveness of candidate antigens (Yamamoto et al., 1998; Briles etal., 2000).

While the PspA protein, PsaA lipoprotein and the CbpA protein antigensappear promising, it is possible that no one protein antigen will beeffective against all Streptococcus pneumoniae serotypes. Laboratoriestherefore continue to search for additional candidates that areantigenically conserved and elicit antibodies that reduce colonization(important for otitis media), are protective against systemic disease,or both. Thus, there is an immediate need for a cost-effective vaccineto cover most or all of the disease causing serotypes of Streptococcuspneumoniae and methods of diagnosing Streptococcus pneumoniae infection.A better understanding of the genetic and molecular levels ofStreptococcus pneumoniae infection will provide the basis for furtherdevelopment of preventative treatments, therapeutic treatments, newdiagnostics and vaccine strategies which are specific for Streptococcuspneumoniae.

SUMMARY OF INVENTION

The present invention broadly relates to Streptococcus pneumoniaegenomic sequence. More particularly, the invention relates to newlyidentified polynucleotide open reading frames comprised within thegenomic nucleotide sequence of Streptococcus pneumoniae, wherein theopen reading frames encode polypeptides that are surface localized,membrane associated, secreted, or exposed on Streptococcus pneumoniae.

Thus, in certain aspects, the invention relates to Streptococcuspneumoniae open reading frames that encode Streptococcus pneumoniaepolypeptides. In preferred embodiments, these Streptococcus pneumoniaepolypeptides are antigenic polypeptides. As defined hereinafter, aStreptococcus pneumoniae antigenic polypeptide, antigen or immunogen, isa Streptococcus pneumoniae polypeptide that is immunoreactive with anantibody or is a Streptococcus pneumoniae polypeptide that elicits animmune response. In other embodiments, the invention relates to thepolynucleotides encoding these antigenic polypeptides. In other aspects,the invention relates to vectors comprising open reading frame sequencesand cells or animals transformed, transfected or infected with thesevectors. The invention relates also to methods of detecting thesenucleic acids or polypeptides and kits for diagnosing Streptococcuspneumoniae infection. The invention further relates to pharmaceuticalcompositions, in particular immunogenic compositions, for the preventionand/or treatment of bacterial infection, in particular infections withStreptococcus pneumoniae.

In a preferred embodiment, the immunogenic compositions are used for thetreatment or prevention of systemic diseases that are induced orworsened by Streptococcus pneumoniae. In another preferred embodiment,the immunogenic compositions are used for the treatment or prevention ofnon-systemic diseases, particularly of the otitis media, which areinduced or worsened by Streptococcus pneumoniae.

In particular embodiments, an isolated polynucleotide of the presentinvention is a polynucleotide comprising a nucleotide sequence having atleast about 95% identity to a nucleotide sequence chosen from one of SEQID NO: 1 through SEQ ID NO: 215 or SEQ ID NO:431 through SEQ ID NO:591,a degenerate variant thereof, or a fragment thereof. As definedhereinafter, a “degenerate variant” is defined as a polynucleotide thatdiffers from the nucleotide sequence shown in SEQ ID NO:1 through SEQ IDNO:215 and SEQ ID NO:431 through SEQ ID NO:591 (and fragments thereof)due to degeneracy of the genetic code, but still encodes the sameStreptococcus pneumoniae polypeptide (i.e., SEQ ID NO:216 through SEQ IDNO:430 and SEQ ID NO:592 through SEQ ID NO:752) as that encoded by thenucleotide sequence shown in SEQ ID NO:1 through SEQ ID NO:215 and SEQID NO:431 through SEQ ID NO:591.

In other embodiments, the polynucleotide is a complement to a nucleotidesequence chosen from one of SEQ ID NO: 1 through SEQ ID NO: 215 or SEQID NO:431 through SEQ ID NO:591, a degenerate variant thereof, or afragment thereof. In yet other embodiments, the polynucleotide isselected from the group consisting of DNA, chromosomal DNA, cDNA and RNAand may further comprise heterologous nucleotides.

In another embodiment, the invention comprises an isolatedpolynucleotide that hybridizes to a nucleotide sequence chosen from oneof SEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO:431 through SEQ IDNO:591, a complement thereof, a degenerate variant thereof, or afragment thereof, under high stringency hybridization conditions. In yetother embodiments, the polynucleotide hybridizes under intermediatestringency hybridization conditions.

In a preferred embodiment, an isolated polynucleotide of a Streptococcuspneumoniae genomic sequence comprises a nucleotide sequence chosen fromone of SEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO:431 through SEQID NO:591, a fragment thereof, or a degenerate variant thereof, andencodes a polypeptide, a biological equivalent thereof, or a fragmentthereof, selected from the group consisting of a Streptococcuspneumoniae polypeptide having 0, 1 or 2 transmembrane domains, aStreptococcus pneumoniae polypeptide having 3 or more transmembranedomains, a Streptococcus pneumoniae polypeptide having an outer membranedomain or a periplasmic domain, a Streptococcus pneumoniae polypeptidehaving an inner membrane domain, a Streptococcus pneumoniae polypeptideidentified by Blastp analysis, a Streptococcus pneumoniae polypeptideidentified by Pfam analysis, a Streptococcus pneumoniae lipoprotein, aStreptococcus pneumoniae polypeptide having a LPXTG motif, wherein thepolypeptide is covalently attached to the peptidoglycan layer, aStreptococcus pneumoniae polypeptide having a peptidoglycan bindingmotif, wherein the polypeptide is associated with the peptidoglycanlayer, a Streptococcus pneumoniae polypeptide having a signal sequenceand a C-terminal Tyrosine or Phenylalanine amino acid, a Streptococcuspneumoniae polypeptide having a tripeptide RGD sequence, a Streptococcuspneumoniae polypeptide identified by proteomics as surface exposed and aStreptococcus pneumoniae polypeptide identified by proteomics asmembrane associated.

In other embodiments, the isolated polynucleotide is a complement to aStreptococcus pneumoniae genomic sequence comprising a nucleotidesequence chosen from one of SEQ ID NO: 1 through SEQ ID NO: 215 or SEQID NO:431 through SEQ ID NO:591, a fragment thereof, or a degeneratevariant thereof, and encodes a polypeptide, a biological equivalentthereof, or a fragment thereof, selected from the group consisting of aStreptococcus pneumoniae polypeptide having 0, 1 or 2 transmembranedomains, a Streptococcus pneumoniae polypeptide having 3 or moretransmembrane domains, a Streptococcus pneumoniae polypeptide having anouter membrane domain or a periplasmic domain, a Streptococcuspneumoniae polypeptide having an inner membrane domain, a Streptococcuspneumoniae polypeptide identified by Blastp analysis, a Streptococcuspneumoniae polypeptide identified by Pfam analysis, a Streptococcuspneumoniae lipoprotein, a Streptococcus pneumoniae polypeptide having aLPXTG motif, wherein the polypeptide is covalently attached to thepeptidoglycan layer, a Streptococcus pneumoniae polypeptide having apeptidoglycan binding motif, wherein the polypeptide is associated withthe peptidoglycan layer, a Streptococcus pneumoniae polypeptide having asignal sequence and a C-terminal Tyrosine or Phenylalanine amino acid, aStreptococcus pneumoniae polypeptide having a tripeptide RGD sequence, aStreptococcus pneumoniae polypeptide identified by proteomics as surfaceexposed and a Streptococcus pneumoniae polypeptide identified byproteomics as membrane associated. In certain embodiments, thepolynucleotide is selected from the group consisting of DNA, chromosomalDNA, cDNA and RNA and may further comprise heterologous nucleotides. Instill other embodiments, the polynucleotide encodes a fusionpolypeptide.

In a preferred embodiment, a polynucleotide encoding a polypeptidehaving 0, 1 or 2 transmembrane domains comprises a nucleotide sequencechosen from one of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO:7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID NO:16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 22, SEQ IDNO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 28, SEQ ID NO: 29, SEQID NO: 32, SEQ ID NO: 34, SEQ ID NO: 36, SEQ ID NO: 39, SEQ ID NO: 41,SEQ ID NO: 42, SEQ ID NO: 45, SEQ ID NO: 47, SEQ ID NO: 49, SEQ ID NO:50, SEQ ID NO: 51, SEQ ID NO: 53, SEQ ID NO: 55, SEQ ID NO: 57, SEQ IDNO: 58, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQID NO: 64, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69,SEQ ID NO: 70, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO:78, SEQ ID NO: 79, SEQ ID NO: 81, SEQ ID NO: 83, SEQ ID NO: 85, SEQ IDNO: 86, SEQ ID NO: 89, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 95, SEQID NO: 96, SEQ ID NO: 97, SEQ ID NO: 100, SEQ ID NO: 104, SEQ ID NO:105, SEQ ID NO: 106, SEQ ID NO: 109, SEQ ID NO: 110, SEQ ID NO: 111, SEQID NO: 113, SEQ ID NO: 116, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO:123, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQID NO: 131, SEQ ID NO: 132, SEQ ID NO: 134, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID NO: 138, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQID NO: 144, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO:150, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 158, SEQ ID NO: 161, SEQID NO: 162, SEQ ID NO: 165, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO:172, SEQ ID NO: 174, SEQ ID NO: 176, SEQ ID NO: 179, SEQ ID NO: 183, SEQID NO: 185, SEQ ID NO: 187, SEQ ID NO: 192, SEQ ID NO: 195, SEQ ID NO:196, SEQ ID NO: 197, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQID NO: 202, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 207, SEQ ID NO:209 and SEQ ID NO: 210.

In another preferred embodiment, a polynucleotide encoding a polypeptidehaving 3 or more transmembrane domains comprises a nucleotide sequencechosen from one of SEQ ID NO: 2, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO:10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 20, SEQ IDNO: 21, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 30, SEQ ID NO: 31, SEQID NO: 33, SEQ ID NO: 35, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 40,SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 46, SEQ ID NO: 48, SEQ ID NO:52, SEQ ID NO: 54, SEQ ID NO: 56, SEQ ID NO: 59, SEQ ID NO: 65, SEQ IDNO: 71, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 80, SEQID NO: 82, SEQ ID NO: 84, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 90,SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO:101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 107, SEQ ID NO: 108, SEQID NO: 112, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 117, SEQ ID NO:118, SEQ ID NO: 119, SEQ ID NO: 120, SEQ ID NO: 124, SEQ ID NO: 129, SEQID NO: 130, SEQ ID NO: 133, SEQ ID NO: 135, SEQ ID NO: 139, SEQ ID NO:140, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 151, SEQ ID NO: 152, SEQID NO: 153, SEQ ID NO: 154, SEQ ID NO: 157, SEQ ID NO: 159, SEQ ID NO:160, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 166, SEQ ID NO: 167, SEQID NO: 168, SEQ ID NO: 169, SEQ ID NO: 173, SEQ ID NO: 175, SEQ ID NO:177, SEQ ID NO: 178, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQID NO: 184, SEQ ID NO: 186, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO:190, SEQ ID NO: 191, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 198, SEQID NO: 203, SEQ ID NO: 206, SEQ ID NO: 208, SEQ ID NO: 211, SEQ ID NO:212, SEQ ID NO: 213, SEQ ID NO: 214 and SEQ ID NO: 215.

In other preferred embodiments, a polynucleotide encoding a polypeptidehaving an outer membrane domain or a periplasmic domain comprises anucleotide sequence chosen from one of SEQ ID NO: 3, SEQ ID NO: 8, SEQID NO: 9, SEQ ID NO: 23, SEQ ID NO: 39, SEQ ID NO: 50, SEQ ID NO: 62,SEQ ID NO: 67, SEQ ID NO: 78, SEQ ID NO: 85, SEQ ID NO: 125, SEQ ID NO:134, SEQ ID NO: 147, SEQ ID NO: 165, SEQ ID NO: 172 and SEQ ID NO: 179.

In other preferred embodiments, a polynucleotide encoding a polypeptidehaving an inner membrane domain comprises a nucleotide sequence chosenfrom one of SEQ ID NO: 2, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14,SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 26, SEQ ID NO: 27, SEQ IDNO: 28, SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37,SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO:46, SEQ ID NO: 47, SEQ ID NO: 48, SEQ ID NO: 51, SEQ ID NO: 52, SEQ IDNO: 53, SEQ ID NO: 54, SEQ ID NO: 56, SEQ ID NO: 59, SEQ ID NO: 60, SEQID NO: 61, SEQ ID NO: 65, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70,SEQ ID NO: 71, SEQ ID NO: 73, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO:77, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ IDNO: 83, SEQ ID NO: 84, SEQ ID NO: 86 SEQ ID NO: 87, SEQ ID NO: 88, SEQID NO: 90, SEQ ID NO: 91, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95,SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO:100, SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 105, SEQID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, SEQ ID NO:112, SEQ ID NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 117, SEQID NO: 118, SEQ ID NO: 119, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO:122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 126, SEQ ID NO: 127, SEQID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO:132, SEQ ID NO: 133, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 139, SEQID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 144, SEQ ID NO:145, SEQ ID NO: 146, SEQ ID NO: 148, SEQ ID NO: 150, SEQ ID NO: 151, SEQID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 156, SEQ ID NO:157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 162, SEQID NO: 163, SEQ ID NO: 164, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO:168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 173, SEQ ID NO: 175, SEQID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 180, SEQ ID NO:181, SEQ ID NO: 182, SEQ ID NO: 184, SEQ ID NO: 186, SEQ ID NO: 187, SEQID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO:192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 198, SEQID NO: 200, SEQ ID NO: 203, SEQ ID NO: 206, SEQ ID NO: 208, SEQ ID NO:209, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214 andSEQ ID NO: 215.

In yet another preferred embodiment, a polynucleotide encoding apolypeptide identified by Blastp analysis comprises a nucleotidesequence chosen from one of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 7,SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 16, SEQ ID NO: 20, SEQ ID NO:24, SEQ ID NO: 27, SEQ ID NO: 30, SEQ ID NO: 31, SEQ ID NO: 32, SEQ IDNO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 38, SEQ ID NO: 40, SEQID NO: 42, SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 48, SEQ ID NO: 51,SEQ ID NO: 53, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO:65, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ IDNO: 71, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQID NO: 79, SEQ ID NO: 80, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 90,SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 98, SEQ ID NO:100, SEQ ID NO: 103, SEQ ID NO: 105, SEQ ID NO: 107, SEQ ID NO: 108, SEQID NO: 109, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID NO: 115, SEQ ID NO:117, SEQ ID NO: 118, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQID NO: 127, SEQ ID NO: 129, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO:133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 138, SEQID NO: 139, SEQ ID NO: 141, SEQ ID NO: 144, SEQ ID NO: 146, SEQ ID NO:147, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 154, SEQ ID NO: 155, SEQID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO:161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 165, SEQ ID NO: 166, SEQID NO: 167, SEQ ID NO: 169, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO:176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 180, SEQ ID NO: 181, SEQID NO: 182, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO:188, SEQ ID NO: 189, SEQ ID NO: 191, SEQ ID NO: 193, SEQ ID NO: 196, SEQID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO:201, SEQ ID NO: 202, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQID NO: 207, SEQ ID NO: 208, SEQ ID NO: 210, SEQ ID NO: 212, SEQ ID NO:213 and SEQ ID NO: 214.

In still further preferred embodiments, a polynucleotide encoding apolypeptide identified by Pfam analysis comprises a nucleotide sequencechosen from one of SEQ ID NO: 4, SEQ ID NO: 18, SEQ ID NO: 19, SEQ IDNO: 41, SEQ ID NO: 45, SEQ ID NO: 55, SEQ ID NO: 57, SEQ ID NO: 58, SEQID NO: 63, SEQ ID NO: 64, SEQ ID NO: 66, SEQ ID NO: 72, SEQ ID NO: 74,SEQ ID NO: 89, SEQ ID NO: 92, SEQ ID NO: 104, SEQ ID NO: 111, SEQ ID NO:116, SEQ ID NO: 119, SEQ ID NO: 128, SEQ ID NO: 137, SEQ ID NO: 142, SEQID NO: 143, SEQ ID NO: 149, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO:153, SEQ ID NO: 157, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 162, SEQID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO:169, SEQ ID NO: 171, SEQ ID NO: 174, SEQ ID NO: 176, SEQ ID NO: 180, SEQID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 186, SEQ ID NO:188, SEQ ID NO 189, SEQ ID NO: 195, SEQ ID NO: 198, SEQ ID NO 199, SEQID NO: 205, SEQ ID NO: 212 and SEQ ID NO: 213.

In another preferred embodiment, a polynucleotide encoding a lipoproteincomprises a nucleotide sequence chosen from one of SEQ ID NO: 3, SEQ IDNO: 8, SEQ ID NO: 9, SEQ ID NO: 13, SEQ ID NO: 21, SEQ ID NO: 26, SEQ IDNO: 34, SEQ ID NO: 62, SEQ ID NO: 67, SEQ ID NO: 85, SEQ ID NO: 134, SEQID NO: 147, SEQ ID NO: 150, SEQ ID NO: 168, SEQ ID NO: 170 and SEQ IDNO: 173.

In other preferred embodiments, a polynucleotide encoding a polypeptidehaving a LPXTG motif and is covalently attached to the peptidoglycanlayer comprises a nucleotide sequence chosen from one of SEQ ID NO: 13,SEQ ID NO: 21, SEQ ID NO: 34 and SEQ ID NO: 170; or a polynucleotideencoding a polypeptide having a peptidoglycan binding motif andassociated with the peptidoglycan layer comprises a nucleotide sequencechosen from one of SEQ ID NO: 25, SEQ ID NO: 49 and SEQ ID NO: 110.

In another preferred embodiment, a polynucleotide encoding a polypeptidehaving a signal sequence and a C-terminal Tyrosine or Phenylalanineamino acid comprises a nucleotide sequence chosen from one of SEQ IDNO:11, SEQ ID NO:39, SEQ ID NO:73, SEQ ID NO:97, SEQ ID NO:106, SEQ IDNO: 125 and SEQ ID NO:187.

In yet another preferred embodiment, a polynucleotide encoding apolypeptide having a tripeptide RGD sequence that potentially isinvolved in cell attachment comprises a nucleotide sequence chosen fromone of SEQ ID NO:1, SEQ ID NO:21, SEQ ID NO:66 and SEQ ID NO:67.

In another preferred embodiment, a polynucleotide encoding a polypeptideidentified by proteomics as surface exposed comprises a nucleotidesequence chosen from one of SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:17,SEQ ID NO:46, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:69,SEQ ID NO:71, SEQ ID NO:74, SEQ ID NO:91, SEQ ID NO:103, SEQ ID NO:116,SEQ ID NO:128, SEQ ID NO:131, SEQ ID NO:136, SEQ ID NO:151, SEQ IDNO:156, SEQ ID NO:159, SEQ ID NO:162, SEQ ID NO:164, SEQ ID NO:172, SEQID NO:176, SEQ ID NO:178, SEQ ID NO:179, SEQ ID NO:180, SEQ ID NO:182and SEQ ID NO:205.

In still another embodiment, a polynucleotide encoding a polypeptideidentified by proteomics as membrane associated comprises a nucleotidesequence chosen from one of SEQ ID NO:431 through SEQ ID NO:591.

In certain aspects, the invention relates to Streptococcus pneumoniaepolypeptides. More particularly, the invention relates to Streptococcuspneumoniae polypeptides, more preferably antigenic polypeptides, encodedby Streptococcus pneumoniae polynucleotide open reading frames. Thus, incertain embodiments, an isolated polypeptide is encoded by apolynucleotide comprising a nucleotide sequence having at least about95% identity to a nucleotide sequence chosen from one of SEQ ID NO: 1through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591, adegenerate variant thereof, or a fragment thereof. In a preferredembodiment, the isolated polypeptide encoded by one of the abovepolynucleotides comprises an amino acid sequence having at least about95% identity to an amino acid sequence chosen from one of SEQ ID NO: 216through SEQ ID NO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752, abiological equivalent thereof, or a fragment thereof. In otherembodiments, the polypeptide is a fusion polypeptide. In a preferredembodiment, the polypeptide immunoreacts with seropositive serum of anindividual infected with Streptococcus pneumoniae.

In preferred embodiments, the isolated polypeptide encoded by apolynucleotide comprising a nucleotide sequence having at least about95% identity to a nucleotide sequence chosen from one of SEQ ID NO: 1through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591, adegenerate variant thereof, or a fragment thereof, is further defined asa Streptococcus pneumoniae polypeptide having 0, 1 or 2 transmembranedomains, a Streptococcus pneumoniae polypeptide having 3 or moretransmembrane domains, a Streptococcus pneumoniae polypeptide having anouter membrane domain or a periplasmic domain, a Streptococcuspneumoniae polypeptide having an inner membrane domain, a Streptococcuspneumoniae polypeptide identified by Blastp analysis, a Streptococcuspneumoniae polypeptide identified by Pfam analysis, a Streptococcuspneumoniae lipoprotein, a Streptococcus pneumoniae polypeptide having aLPXTG motif, wherein the polypeptide is covalently attached to thepeptidoglycan layer, a Streptococcus pneumoniae polypeptide having apeptidoglycan binding motif, wherein the polypeptide is associated withthe peptidoglycan layer, a Streptococcus pneumoniae polypeptide having asignal sequence and a C-terminal Tyrosine or Phenylalanine amino acid, aStreptococcus pneumoniae polypeptide having a tripeptide RGD sequence, aStreptococcus pneumoniae polypeptide identified by proteomics as surfaceexposed or a Streptococcus pneumoniae polypeptide identified byproteomics as membrane associated, where each of these groups has theset of ORFs identified above as within SEQ ID NO: 1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591.

In a particularly preferred embodiment, an isolated polypeptidecomprises an amino acid sequence having at least about 95% identity toan amino acid sequence chosen from one of SEQ ID NO: 216 through SEQ IDNO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biologicalequivalent thereof, or a fragment thereof. In another embodiment, thepolypeptide is a fusion polypeptide. In a particularly preferredembodiment, the polypeptide immunoreacts with seropositive serum of anindividual infected with Streptococcus pneumoniae. In yet otherpreferred embodiments, the polypeptide is further defined as aStreptococcus pneumoniae polypeptide having 0, 1 or 2 transmembranedomains, a Streptococcus pneumoniae polypeptide having 3 or moretransmembrane domains, a Streptococcus pneumoniae polypeptide having anouter membrane domain or a periplasmic domain, a Streptococcuspneumoniae polypeptide having an inner membrane domain, a Streptococcuspneumoniae polypeptide identified by Blastp analysis, a Streptococcuspneumoniae polypeptide identified by Pfam analysis, a Streptococcuspneumoniae lipoprotein, a Streptococcus pneumoniae polypeptide having aLPXTG motif, wherein the polypeptide is covalently attached to thepeptidoglycan layer, a Streptococcus pneumoniae polypeptide having apeptidoglycan binding motif, wherein the polypeptide is associated withthe peptidoglycan layer, a Streptococcus pneumoniae polypeptide having asignal sequence and a C-terminal Tyrosine or Phenylalanine amino acid, aStreptococcus pneumoniae polypeptide having a tripeptide RGD sequence, aStreptococcus pneumoniae polypeptide identified by proteomics as surfaceexposed or a Streptococcus pneumoniae polypeptide identified byproteomics as membrane associated.

In a preferred embodiment, a polypeptide having 0, 1 or 2 transmembranedomains comprises an amino acid sequence chosen from one of SEQ ID NO:216, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 222, SEQ ID NO: 223, SEQID NO: 224, SEQ ID NO: 226, SEQ ID NO: 228, SEQ ID NO: 231, SEQ ID NO:232, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 237, SEQ ID NO: 238, SEQID NO: 239, SEQ ID NO: 240, SEQ ID NO: 243, SEQ ID NO: 244, SEQ ID NO:247, SEQ ID NO: 249, SEQ ID NO: 251, SEQ ID NO: 254, SEQ ID NO: 256, SEQID NO: 257, SEQ ID NO: 260, SEQ ID NO: 262, SEQ ID NO: 264, SEQ ID NO:265, SEQ ID NO: 266, SEQ ID NO: 268, SEQ ID NO: 270, SEQ ID NO: 272, SEQID NO: 273, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID NO:278, SEQ ID NO: 279, SEQ ID NO: 281, SEQ ID NO: 282, SEQ ID NO: 283, SEQID NO: 284, SEQ ID NO: 285, SEQ ID NO: 286, SEQ ID NO: 287, SEQ ID NO:289, SEQ ID NO: 293, SEQ ID NO: 294, SEQ ID NO: 296, SEQ ID NO: 298, SEQID NO: 300, SEQ ID NO: 301, SEQ ID NO: 304, SEQ ID NO: 306, SEQ ID NO:307, SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID NO: 315, SEQID NO: 319, SEQ ID NO: 320, SEQ ID NO: 321, SEQ ID NO: 324, SEQ ID NO:325, SEQ ID NO: 326, SEQ ID NO: 328, SEQ ID NO: 331, SEQ ID NO: 336, SEQID NO: 337, SEQ ID NO: 338, SEQ ID NO: 340, SEQ ID NO: 341, SEQ ID NO:342, SEQ ID NO: 343, SEQ ID NO: 346, SEQ ID NO: 347, SEQ ID NO: 349, SEQID NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO: 356, SEQ ID NO:357, SEQ ID NO: 358, SEQ ID NO: 359, SEQ ID NO: 362, SEQ ID NO: 363, SEQID NO: 364, SEQ ID NO: 365, SEQ ID NO: 370, SEQ ID NO: 371, SEQ ID NO:373, SEQ ID NO: 376, SEQ ID NO: 377, SEQ ID NO: 380, SEQ ID NO: 385, SEQID NO: 386, SEQ ID NO: 387, SEQ ID NO: 389, SEQ ID NO: 391, SEQ ID NO:394, SEQ ID NO: 398, SEQ ID NO: 400, SEQ ID NO: 402, SEQ ID NO: 407, SEQID NO: 410, SEQ ID NO: 411, SEQ ID NO: 412, SEQ ID NO: 414, SEQ ID NO:415, SEQ ID NO: 416, SEQ ID NO: 417, SEQ ID NO: 419, SEQ ID NO: 420, SEQID NO: 422, SEQ ID NO: 424, SEQ ID NO: 425, a biological equivalentthereof, or a fragment thereof.

In another preferred embodiment, a polypeptide having 3 or moretransmembrane domains comprises an amino acid sequence chosen from oneof SEQ ID NO: 217, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 225, SEQID NO: 227, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 235, SEQ ID NO:236, SEQ ID NO: 241, SEQ ID NO: 242, SEQ ID NO: 245, SEQ ID NO: 246, SEQID NO: 248, SEQ ID NO: 250, SEQ ID NO: 252, SEQ ID NO: 253, SEQ ID NO:255, SEQ ID NO: 258, SEQ ID NO: 259, SEQ ID NO: 261, SEQ ID NO: 263, SEQID NO: 267, SEQ ID NO: 269, SEQ ID NO: 271, SEQ ID NO: 274, SEQ ID NO:280, SEQ ID NO: 286, SEQ ID NO: 290, SEQ ID NO: 291, SEQ ID NO: 292, SEQID NO: 295, SEQ ID NO: 297, SEQ ID NO: 299, SEQ ID NO: 302, SEQ ID NO:303, SEQ ID NO: 305, SEQ ID NO: 308, SEQ ID NO: 309, SEQ ID NO: 313, SEQID NO: 314, SEQ ID NO: 316, SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO:322, SEQ ID NO: 323, SEQ ID NO: 327, SEQ ID NO: 329, SEQ ID NO: 330, SEQID NO: 332, SEQ ID NO: 333, SEQ ID NO: 334, SEQ ID NO: 335, SEQ ID NO:339, SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO: 348, SEQ ID NO: 350, SEQID NO: 354, SEQ ID NO: 355, SEQ ID NO: 360, SEQ ID NO: 361, SEQ ID NO:366, SEQ ID NO: 367, SEQ ID NO: 368, SEQ ID NO: 369, SEQ ID NO: 372, SEQID NO: 374, SEQ ID NO: 375, SEQ ID NO: 378, SEQ ID NO: 379, SEQ ID NO:381, SEQ ID NO: 382, SEQ ID NO: 383, SEQ ID NO: 384, SEQ ID NO: 388, SEQID NO: 390, SEQ ID NO: 392, SEQ ID NO: 393, SEQ ID NO: 395, SEQ ID NO:396, SEQ ID NO: 397, SEQ ID NO: 399, SEQ ID NO: 401, SEQ ID NO: 403, SEQID NO: 404, SEQ ID NO: 405, SEQ ID NO: 406, SEQ ID NO: 408, SEQ ID NO:409, SEQ ID NO: 413, SEQ ID NO: 418, SEQ ID NO: 421, SEQ ID NO: 423, SEQID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, SEQ ID NO: 429, SEQ ID NO:430, a biological equivalent thereof, or a fragment thereof.

In yet other preferred embodiments, a polypeptide having an outermembrane domain or a periplasmic domain comprises an amino acid sequencechosen from one of SEQ ID NO: 218, SEQ ID NO: 223, SEQ ID NO: 224, SEQID NO: 238, SEQ ID NO: 254, SEQ ID NO: 265, SEQ ID NO: 277, SEQ ID NO:282, SEQ ID NO: 293, SEQ ID NO: 300, SEQ ID NO: 340, SEQ ID NO: 349, SEQID NO: 362, SEQ ID NO: 380, SEQ ID NO: 387, SEQ ID NO: 394, a biologicalequivalent thereof, or a fragment thereof.

In yet other preferred embodiments, a polynucleotide encoding apolypeptide having an inner membrane domain comprises an amino acidsequence chosen from one of SEQ ID NO: 217, SEQ ID NO: 220, SEQ ID NO:221, SEQ ID NO: 222, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO:232, SEQ ID NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQID NO: 241, SEQ ID NO: 242, SEQ ID NO: 243, SEQ ID NO: 244, SEQ ID NO:245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID NO: 253, SEQ ID NO:255, SEQ ID NO: 258, SEQ ID NO: 259, SEQ ID NO: 261, SEQ ID NO: 262, SEQID NO: 263, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO:269, SEQ ID NO: 271, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, SEQID NO: 280, SEQ ID NO: 283, SEQ ID NO: 284, SEQ ID NO: 285, SEQ ID NO:286, SEQ ID NO: 288, SEQ ID NO: 290, SEQ ID NO: 291, SEQ ID NO: 292, SEQID NO: 294, SEQ ID NO: 295, SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO:298, SEQ ID NO: 299, SEQ ID NO: 301 SEQ ID NO: 302, SEQ ID NO: 303, SEQID NO: 305, SEQ ID NO: 306, SEQ ID NO: 308, SEQ ID NO: 309, SEQ ID NO:310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID NO: 313, SEQ ID NO: 314, SEQID NO: 315, SEQ ID NO: 316, SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO:320, SEQ ID NO: 321, SEQ ID NO: 322, SEQ ID NO: 323, SEQ ID NO: 324, SEQID NO: 327, SEQ ID NO: 328, SEQ ID NO: 329, SEQ ID NO: 330, SEQ ID NO:332, SEQ ID NO: 333, SEQ ID NO: 334, SEQ ID NO: 335, SEQ ID NO: 336, SEQID NO: 337, SEQ ID NO: 338, SEQ ID NO: 339, SEQ ID NO: 341, SEQ ID NO:342, SEQ ID NO: 343, SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO: 346, SEQID NO: 347, SEQ ID NO: 348, SEQ ID NO: 350, SEQ ID NO: 351, SEQ ID NO:354, SEQ ID NO: 355, SEQ ID NO: 356, SEQ ID NO: 357, SEQ ID NO: 359, SEQID NO: 360, SEQ ID NO: 361, SEQ ID NO: 362, SEQ ID NO: 365, SEQ ID NO:366, SEQ ID NO: 367, SEQ ID NO: 368, SEQ ID NO: 369, SEQ ID NO: 371, SEQID NO: 372, SEQ ID NO: 373, SEQ ID NO: 374, SEQ ID NO: 375, SEQ ID NO:377, SEQ ID NO: 378, SEQ ID NO: 379, SEQ ID NO: 381, SEQ ID NO: 382, SEQID NO: 383, SEQ ID NO: 384, SEQ ID NO: 385, SEQ ID NO: 388, SEQ ID NO:390, SEQ ID NO: 391, SEQ ID NO: 392, SEQ ID NO: 393, SEQ ID NO: 395, SEQID NO: 396, SEQ ID NO: 397, SEQ ID NO: 399, SEQ ID NO: 401, SEQ ID NO:402, SEQ ID NO: 403, SEQ ID NO: 404, SEQ ID NO: 405, SEQ ID NO: 406, SEQID NO: 407, SEQ ID NO: 408, SEQ ID NO: 409, SEQ ID NO: 410, SEQ ID NO:413, SEQ ID NO: 415, SEQ ID NO: 418, SEQ ID NO: 421, SEQ ID NO: 423, SEQID NO: 424, SEQ ID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, SEQ ID NO:429, SEQ ID NO: 430, a biological equivalent thereof, or a fragmentthereof.

In still another preferred embodiment, a polypeptide identified byBlastp analysis comprises an amino acid sequence chosen from one of SEQID NO: 216, SEQ ID NO: 217, SEQ ID NO: 222, SEQ ID NO: 225, SEQ ID NO:227, SEQ ID NO: 231, SEQ ID NO: 235, SEQ ID NO: 239, SEQ ID NO: 242, SEQID NO: 245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID NO: 248, SEQ ID NO:249, SEQ ID NO: 250, SEQ ID NO: 253, SEQ ID NO: 255, SEQ ID NO: 257, SEQID NO: 258, SEQ ID NO: 259, SEQ ID NO: 263, SEQ ID NO: 266, SEQ ID NO:268, SEQ ID NO: 269, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 280, SEQID NO: 282, SEQ ID NO: 283, SEQ ID NO: 284, SEQ ID NO: 285, SEQ ID NO:286, SEQ ID NO: 290, SEQ ID NO: 291, SEQ ID NO: 292, SEQ ID NO: 293, SEQID NO: 294, SEQ ID NO: 295, SEQ ID NO: 302, SEQ ID NO: 303, SEQ ID NO:305, SEQ ID NO: 309, SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 313, SEQID NO: 315, SEQ ID NO: 318, SEQ ID NO: 320, SEQ ID NO: 322, SEQ ID NO:323, SEQ ID NO: 324, SEQ ID NO: 327, SEQ ID NO: 328, SEQ ID NO: 330, SEQID NO: 332, SEQ ID NO: 333, SEQ ID NO: 337, SEQ ID NO: 338, SEQ ID NO:339, SEQ ID NO: 342, SEQ ID NO: 344, SEQ ID NO: 346, SEQ ID NO: 347, SEQID NO: 348, SEQ ID NO: 349, SEQ ID NO: 350, SEQ ID NO: 351, SEQ ID NO:353, SEQ ID NO: 354, SEQ ID NO: 356, SEQ ID NO: 359, SEQ ID NO: 361, SEQID NO: 362, SEQ ID NO: 366, SEQ ID NO: 367, SEQ ID NO: 369, SEQ ID NO:370, SEQ ID NO: 372, SEQ ID NO: 373, SEQ ID NO: 374, SEQ ID NO: 375, SEQID NO: 376, SEQ ID NO: 377, SEQ ID NO: 378, SEQ ID NO: 380, SEQ ID NO:381, SEQ ID NO: 382, SEQ ID NO: 384, SEQ ID NO: 387, SEQ ID NO: 388, SEQID NO: 391, SEQ ID NO: 392, SEQ ID NO: 393, SEQ ID NO: 395, SEQ ID NO:396, SEQ ID NO: 397, SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID NO: 401, SEQID NO: 403, SEQ ID NO: 404, SEQ ID NO: 406, SEQ ID NO: 408, SEQ ID NO:411, SEQ ID NO: 412, SEQ ID NO: 413, SEQ ID NO: 414, SEQ ID NO: 415, SEQID NO: 416, SEQ ID NO: 417, SEQ ID NO: 419, SEQ ID NO: 420, SEQ ID NO:421, SEQ ID NO: 422, SEQ ID NO: 423, SEQ ID NO: 425, SEQ ID NO: 427, SEQID NO: 428, SEQ ID NO: 429, a biological equivalent thereof, or afragment thereof.

In other preferred embodiments, a polypeptide identified by Pfamanalysis comprises an amino acid sequence chosen from one of SEQ ID NO:219, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 255, SEQ ID NO: 260, SEQID NO: 270, SEQ ID NO: 272, SEQ ID NO: 273, SEQ ID NO: 278, SEQ ID NO:279, SEQ ID NO: 281, SEQ ID NO: 287, SEQ ID NO: 289, SEQ ID NO: 304, SEQID NO: 307, SEQ ID NO: 319, SEQ ID NO: 326, SEQ ID NO: 331, SEQ ID NO:334, SEQ ID NO: 343, SEQ ID NO: 352, SEQ ID NO: 357, SEQ ID NO: 358, SEQID NO: 364, SEQ ID NO: 366, SEQ ID NO: 367, SEQ ID NO: 368, SEQ ID NO:372, SEQ ID NO: 374, SEQ ID NO: 375, SEQ ID NO: 377, SEQ ID NO: 378, SEQID NO: 379, SEQ ID NO: 380, SEQ ID NO: 381, SEQ ID NO: 384, SEQ ID NO:386, SEQ ID NO: 389, SEQ ID NO: 391, SEQ ID NO: 395, SEQ ID NO: 397, SEQID NO: 398, SEQ ID NO: 399, SEQ ID NO: 401, SEQ ID NO: 403, SEQ ID NO404, SEQ ID NO: 410, SEQ ID NO: 413, SEQ ID NO 414, SEQ ID NO: 420, SEQID NO: 427, SEQ ID NO: 428, a biological equivalent thereof, or afragment thereof.

In one preferred embodiment, a polypeptide is a lipoprotein andcomprises an amino acid sequence chosen from one of SEQ ID NO: 218, SEQID NO: 223, SEQ ID NO: 224, SEQ ID NO: 228, SEQ ID NO: 236, SEQ ID NO:241, SEQ ID NO: 249, SEQ ID NO: 277, SEQ ID NO: 282, SEQ ID NO: 300, SEQID NO: 349, SEQ ID NO: 362, SEQ ID NO: 365, SEQ ID NO: 383, SEQ ID NO:385, SEQ ID NO: 388, a biological equivalent thereof, or a fragmentthereof.

In certain other preferred embodiments, a polypeptide having a LPXTGmotif and covalently attached to the peptidoglycan layer, comprises anamino acid sequence chosen from one of SEQ ID NO: 228, SEQ ID NO: 236,SEQ ID NO: 249, SEQ, SEQ ID NO: 385, a biological equivalent thereof, ora fragment thereof; or a polypeptide having a peptidoglycan bindingmotif and associated with the peptidoglycan layer comprises an aminoacid sequence chosen from one of SEQ ID NO: 240, SEQ ID NO: 264, SEQ IDNO: 325, a biological equivalent thereof, or a fragment thereof.

In another preferred embodiment, a polypeptide having a signal sequenceand a C-terminal Tyrosine or Phenylalanine amino acid comprises an aminoacid sequence chosen from one of SEQ ID NO:226, SEQ ID NO:254, SEQ IDNO:289, SEQ ID NO:312, SEQ ID NO:321, SEQ ID NO: 340, SEQ ID NO:402, abiological equivalent thereof, or a fragment thereof.

In yet another preferred embodiment, a polypeptide having a tripeptideRGD sequence that potentially is involved in cell attachment comprisesan amino acid sequence chosen from one of SEQ ID NO:216, SEQ ID NO:236,SEQ ID NO:281, SEQ ID NO:282, a biological equivalent thereof, or afragment thereof.

In still another embodiment, a polypeptide identified by proteomics assurface exposed comprises an amino acid sequence chosen from one of SEQID NO: 229, SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID NO: 261, SEQ ID NO:279, SEQ ID NO: 281, SEQ ID NO: 282, SEQ ID NO: 284, SEQ ID NO: 286, SEQID NO: 289, SEQ ID NO: 306, SEQ ID NO: 318, SEQ ID NO: 331, SEQ ID NO:343, SEQ ID NO: 346, SEQ ID NO: 351, SEQ ID NO: 366, SEQ ID NO: 371, SEQID NO: 374, SEQ ID NO: 377, SEQ ID NO: 379, SEQ ID NO: 387, SEQ ID NO:391, SEQ ID NO: 393, SEQ ID NO: 394, SEQ ID NO: 395, SEQ ID NO: 397, SEQID NO: 420, a biological equivalent thereof, or a fragment thereof.

In yet another embodiment, a polypeptide identified by proteomics asmembrane associated comprises an amino acid sequence chosen from one ofSEQ ID NO: 592 through SEQ ID NO: 752, a biological equivalent thereof,or a fragment thereof.

In another aspect of the invention, the polypeptides are expressed andpurified in a recombinant expression system. Thus, in certainembodiments, the invention provides a recombinant expression vectorcomprising a nucleotide sequence having at least about 95% identity to anucleotide sequence chosen from one of SEQ ID NO: 1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, a degenerate variantthereof, or a fragment thereof. In certain other embodiments, thepolynucleotide is selected from the group consisting of DNA, chromosomalDNA, cDNA, RNA and antisense RNA. In another embodiment, thepolynucleotide comprised within the vector further comprisesheterologous nucleotide sequences. In other embodiments, thepolynucleotide is operatively linked to one or more gene expressionregulatory elements. In yet other embodiments, the polynucleotideencodes a polypeptide comprising an amino acid sequence having at leastabout 95% identity to an amino acid sequence chosen from one of SEQ IDNO: 216 through SEQ ID NO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752,a biological equivalent thereof, or a fragment thereof. In a preferredembodiment, the vector is a plasmid.

In another aspect of the invention, there is provided a geneticallyengineered host cell, transfected, transformed or infected with arecombinant expression vector comprising a nucleotide sequence having atleast about 95% identity to a nucleotide sequence chosen from one of SEQID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO:591, a degenerate variant thereof, or a fragment thereof. In a preferredembodiment, the host cell is a bacterial cell. In a further embodiment,the polynucleotide is expressed under suitable conditions to produce theencoded polypeptide, a biological equivalent thereof, or a fragmentthereof, which is then recovered.

In other embodiments, the present invention provides an antibodyspecific for a Streptococcus pneumoniae polynucleotide chosen from oneof SEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ IDNO: 591, a fragment thereof, a degenerate variant thereof, or anantibody specific for a Streptococcus pneumoniae polypeptide chosen fromone of SEQ ID NO: 216 through SEQ ID NO: 430 or SEQ ID NO: 592 throughSEQ ID NO: 752, a biological equivalent thereof, or a fragment thereof.In certain embodiments, the antibody is selected from the groupconsisting of monoclonal, polyclonal, chimeric, humanized and singlechain. In a preferred embodiment, the antibody is monoclonal. In anotherpreferred embodiment, the antibody is humanized.

The present invention further provides pharmaceutical compositions, inparticular immunogenic compositions, for the prevention and/or treatmentof bacterial infection. Thus, in one embodiment an immunogeniccomposition is provided comprising a polypeptide having an amino acidsequence chosen from one or more of SEQ ID NO: 216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biological equivalentthereof, or a fragment thereof. In certain embodiments, the compositionfurther comprises a pharmaceutically acceptable carrier. In yet otherembodiments, the immunogenic composition further comprises one or moreadjuvants. In a preferred embodiment, the polypeptide of the immunogeniccomposition is further defined as a Streptococcus pneumoniae polypeptidehaving 0, 1 or 2 transmembrane domains, a Streptococcus pneumoniaepolypeptide having 3 or more transmembrane domains, a Streptococcuspneumoniae polypeptide having an outer membrane domain or a periplasmicdomain, a Streptococcus pneumoniae polypeptide having an inner membranedomain, a Streptococcus pneumoniae polypeptide identified by Blastpanalysis, a Streptococcus pneumoniae polypeptide identified by Pfamanalysis, a Streptococcus pneumoniae lipoprotein, a Streptococcuspneumoniae polypeptide having a LPXTG motif, wherein the polypeptide iscovalently attached to the peptidoglycan layer, a Streptococcuspneumoniae polypeptide having a peptidoglycan binding motif, wherein thepolypeptide is associated with the peptidoglycan layer, a Streptococcuspneumoniae polypeptide having a signal sequence and a C-terminalTyrosine or Phenylalanine amino acid, a Streptococcus pneumoniaepolypeptide having a tripeptide RGD sequence, a Streptococcus pneumoniaepolypeptide identified by proteomics as surface exposed or aStreptococcus pneumoniae polypeptide identified by proteomics asmembrane associated. In certain other embodiments, the immunogeniccomposition further comprises heterologous amino acids. In particularembodiments, the polypeptide is a fusion polypeptide.

In further embodiments, provided is an immunogenic compositioncomprising a polynucleotide having a nucleotide sequence chosen from oneor more of SEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 throughSEQ ID NO: 591, a degenerate variant thereof, or a fragment thereof andis comprised in an expression vector. In preferred embodiments, thevector is plasmid DNA. In another embodiment, the polynucleotidecomprises heterologous nucleotides. In still other embodiments, thepolynucleotide is operatively linked to one or more gene expressionregulatory elements. In yet other embodiments, the polynucleotidedirects the expression of a neutralizing epitope of Streptococcuspneumoniae. In preferred embodiments, the immunogenic compositionfurther comprises one or more adjuvants.

Also provided is a pharmaceutical composition comprising a polypeptideand a pharmaceutically acceptable carrier, wherein the polypeptidecomprises an amino acid chosen from one of SEQ ID NO: 216 through SEQ IDNO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biologicalequivalent thereof, or a fragment thereof. In preferred embodiments, thepolypeptide is further defined as a Streptococcus pneumoniae polypeptidehaving 0, 1 or 2 transmembrane domains, a Streptococcus pneumoniaepolypeptide having 3 or more transmembrane domains, a Streptococcuspneumoniae polypeptide having an outer membrane domain or a periplasmicdomain, a Streptococcus pneumoniae polypeptide having an inner membranedomain, a Streptococcus pneumoniae polypeptide identified by Blastpanalysis, a Streptococcus pneumoniae polypeptide identified by Pfamanalysis, a Streptococcus pneumoniae lipoprotein, a Streptococcuspneumoniae polypeptide having a LPXTG motif, wherein the polypeptide iscovalently attached to the peptidoglycan layer, a Streptococcuspneumoniae polypeptide having a peptidoglycan binding motif, wherein thepolypeptide is associated with the peptidoglycan layer, a Streptococcuspneumoniae polypeptide having a signal sequence and a C-terminalTyrosine or Phenylalanine amino acid, a Streptococcus pneumoniaepolypeptide having a tripeptide RGD sequence, a Streptococcus pneumoniaepolypeptide identified by proteomics as surface exposed or aStreptococcus pneumoniae polypeptide identified by proteomics asmembrane associated. In certain embodiments, the polypeptide furthercomprises heterologous amino acids. In still other embodiments, thepolypeptide is a fusion polypeptide.

In another embodiment, a method of immunizing against Streptococcuspneumoniae is provided comprising administering to a host an immunizingamount of an immunogenic composition comprising one or more polypeptidesand a pharmaceutically acceptable carrier, wherein the polypeptidecomprises an amino acid sequence chosen from one or more of SEQ ID NO:216 through SEQ ID NO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752, abiological equivalent thereof, or a fragment thereof. In certainembodiments, the polypeptide is a fusion polypeptide. In otherembodiments, the method further comprises administering an adjuvant.

Other embodiments of the invention provide a DNA chip comprising anarray of polynucleotides, wherein at least one of the polynucleotidescomprise a nucleotide sequence chosen from one of SEQ ID NO: 1 throughSEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591, a complementthereof, a degenerate variant thereof, or a fragment thereof.

Also provided is a protein chip comprising an array of polypeptides,wherein at least one of the polypeptides comprises an amino acidsequence chosen from one of SEQ ID NO: 216 through SEQ ID NO: 430 or SEQID NO: 592 through SEQ ID NO: 752, a biological equivalent thereof, or afragment thereof.

The invention further provides methods of detecting Streptococcuspneumoniae polynucleotides and polypeptides as well as kits fordiagnosing Streptococcus pneumoniae infection.

Other embodiments provide a method for the detection and/oridentification of Streptococcus pneumoniae in a biological samplecomprising contacting the sample with an oligonucleotide probe of apolynucleotide comprising the nucleotide sequence chosen from one of SEQID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO:591, a degenerate variant thereof, or a fragment thereof, underconditions permitting hybridization and detecting the presence ofhybridization complexes in the sample, wherein hybridization complexesindicate the presence of Streptococcus pneumoniae in the sample.

Still other embodiments provide a method for the detection and/oridentification of Streptococcus pneumoniae in a biological samplecomprising a nucleotide sequence chosen from one of SEQ ID NO: 1 throughSEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591, a degeneratevariant thereof, or a fragment thereof, in the presence of nucleotidesand a polymerase enzyme under conditions permitting primer extension anddetecting the presence of primer extension products in the sample,wherein extension products indicate the presence of Streptococcuspneumoniae in the sample.

Further embodiments provide a method for the detection and/oridentification of Streptococcus pneumoniae in a biological samplecomprising contacting the sample with an antibody specific for apolypeptide comprising an amino acid sequence chosen from one of SEQ IDNO: 216 through SEQ ID NO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752,a biological equivalent thereof, or a fragment thereof, under conditionspermitting immune complex formation and detecting the presence of immunecomplexes in the sample, wherein immune complexes indicate the presenceof Streptococcus pneumoniae in the sample.

In certain embodiments, provided is a method for the detection and/oridentification of antibodies to Streptococcus pneumoniae in a biologicalsample comprising contacting the sample with a polypeptide comprising anamino acid sequence chosen from one of SEQ ID NO: 216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biological equivalentthereof, or a fragment thereof, under conditions permitting immunecomplex formation and detecting the presence of immune complexes in thesample, wherein immune complexes indicate the presence of Streptococcuspneumoniae in the sample.

Other embodiments of the invention provide a kit comprising a containercontaining an isolated polynucleotide comprising an nucleotide sequencechosen from one of SEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431through SEQ ID NO: 591, a degenerate variant thereof, or a fragmentthereof. In a preferred embodiment, the polynucleotide is a primer or aprobe, wherein when the polynucleotide is a primer, the kit furthercomprises a container containing a polymerase. In another embodiment,the kit further comprises a container containing dNTP.

Provided further is a kit comprising a container containing an antibodythat immunospecifically binds to a polypeptide comprising the amino acidsequence chosen from one of SEQ ID NO: 216 through SEQ ID NO: 430 or SEQID NO: 592 through SEQ ID NO: 752, a biological equivalent thereof, or afragment thereof.

Provided also is a kit comprising a container containing an antibodythat immunospecifically binds to a fusion polypeptide comprising atleast the amino acid sequence chosen from one of SEQ ID NO: 216 throughSEQ ID NO: 430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biologicalequivalent thereof, or a fragment thereof.

In a preferred embodiment of the invention, provided is a geneticallyengineered host cell, transfected, transformed or infected with arecombinant expression vector comprising a nucleotide sequence having atleast about 95% identity to a nucleotide sequence chosen from one of SEQID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO:591, a degenerate variant thereof, or a fragment thereof underconditions suitable to produce one of the polypeptides of SEQ ID NO:216through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752; andrecovering the polypeptide.

Other features and advantages of the invention will be apparent from thefollowing detailed description, from the preferred embodiments thereof,and from the claims.

DETAILED DESCRIPTION OF THE INVENTION

The invention described hereinafter addresses the need for Streptococcuspneumoniae immunogenic compositions that effectively prevent or treatmost or all of the disease caused by serotypes of Streptococcuspneumoniae. The invention further addresses the need for methods ofdiagnosing Streptococcus pneumoniae infection. The present invention hasidentified novel Streptococcus pneumoniae open reading frames,hereinafter ORFs, which encode antigenic polypeptides. Moreparticularly, the newly identified ORFs encode polypeptides that aresecreted, exposed, membrane associated or surface localized onStreptococcus pneumoniae, and thus serve as potential antigenicpolypeptides in immunogenic compositions. Thus, in certain embodiments,the invention comprises Streptococcus pneumoniae polynucleotide ORFsencoding surface localized, exposed, secreted or membrane associatedpolypeptide antigens. The present invention therefore comprises in otherembodiments, these polypeptides, preferably antigenic polypeptides,encoded by the Streptococcus pneumoniae ORFs.

In other embodiments, the invention comprises vectors comprising ORFsequences and host cells or animals transformed, transfected or infectedwith these vectors. The invention also comprises transcriptional geneproducts of Streptococcus pneumoniae ORFs, such as, for example, mRNA,antisense RNA, antisense oligonucleotides and ribozyme molecules, whichcan be used to inhibit or control growth of the microorganism. Theinvention relates also to methods of detecting these nucleic acids orpolypeptides and kits for diagnosing Streptococcus pneumoniae infection.The invention also relates to pharmaceutical compositions, in particularimmunogenic compositions, for the prevention and/or treatment ofbacterial infection, in particular infection caused by or exacerbated byStreptococcus pneumoniae. In particular embodiments, the immunogeniccompositions are used for the treatment or prevention of systemicdiseases which are induced or exacerbated by Streptococcus pneumoniae.In other embodiments, the immunogenic compositions are used for thetreatment or prevention of non-systemic diseases, particularly of theotitis media, which are induced or exacerbated by Streptococcuspneumoniae.

A. Identifying ORFs within the Genomic Sequence of Streptococcuspneumonia

The invention is directed in particular embodiments to theidentification of polynucleotides, more particularly ORFs, that encodeStreptococcus pneumoniae polypeptides. The availability of completebacterial genome sequences has begun to play an important role in theidentification of candidate antigens through genomics, transcriptionalprofiling, and proteomics, coupled with the information processingcapabilities of bioinformatics (McAtee et al., 1998a; McAtee et al.,1998b; Pizza et al., 2000; Sonnenberg and Belisle, 1997; Weldingh etal., 1998; McAtee et al., 1998c). Currently, no more than approximately60% of all ORFs within a bacterial genome have some match with apolypeptide whose function has been determined. This leavesapproximately 40% of genomic ORFs uncharacterized. Thus, the inventorshave analyzed the Streptococcus pneumoniae genome and utilizedbioinformatic tools to identify novel ORFs encoding polypeptides of thepresent invention. In addition to genomic analysis, the inventorsanalyzed the Streptococcus pneumoniae membrane proteome component toidentify novel and/or confirm ORFs encoding polypeptides of the presentinvention. As described below, the ORFs were analyzed for a variety ofcharacteristics.

Specifically, an extensive genomic analysis was performed in silico ofthe Streptococcus pneumoniae type 4 genome from The Institute forGenomic Research (TIGR) using algorithms designed to identify genes thatencode novel surface localized polypeptides or polypeptides withputative similarity to polypeptides of known interest in otherorganisms. Thus, a combined analysis of the Streptococcus pneumoniaegenome, using a unique set of two ORF finder algorithms (i.e., GLIMMER,Salzberg et al., 1998 and inventors' assignee's own program), produced3,799 ORFs. The most stringent of the ORF finders; Glimmer, produced2,022 ORFs, while the assignee's ORF finder produced the most with 3,798ORFs. There were 2,021 ORFs identified by the two algorithms. Thedifference in results between the different ORF finders is primarily dueto the particular start codons used by each program; however, Glimmeralso incorporates some evaluation for a Shine-Dalgarno box and aninterpolated Markov model. For the purposes here, all ORFs with commonstop codons are given the same ORF designation and will be treated as ifthey are the same ORF. As used hereinafter, an ORF is defined as havingone of three potential start site codons, ATG, GTG or TTG and one ofthree potential stop codons, TAA, TAG or TGA. The lower limit of aminoacid length selected as a cutoff (e.g., ˜74 amino acids) may also causethe algorithms to overlook some reading frames. However, these “true”reading frames become an increasingly rare event as the ORFs becomeshorter.

The initial annotation of the Streptococcus pneumoniae ORFs wasperformed using the Basic Local Alignment Search Tool (BLAST; version2.0) Gapped search algorithm, Blastp, to identify homologous sequences(Altschul et al., 1997). A cutoff ‘e’ value of anything <e⁻¹° wasconsidered significant. The non-redundant protein sequence database usedfor the homology searches consisted of GenBank, SWISS-PROT (Bairoch andApweiler, 2000), PIR (Barker et al., 2001), and TREMBL (Bairoch andApweiler, 2000); whose database sequences are updated daily. In thepresent invention, ORFs with a Blastp result of >e⁻¹° are considered tobe unique to Streptococcus pneumoniae. Alternate quantitative expressionvalues other than Blastp ‘e’, e.g., percent identity, may also be usedto compare database sequences with the Streptococcus pneumoniae ORFs ofthe present invention.

A keyword search of the entire BLAST results was carried out using knownor suspected target genes for immunogenic compositions as well as wordsthat identified the location of a protein or function.

Several parameters were used to determine grouping of the predictedStreptococcus pneumoniae polypeptides of the invention. For example,polypeptides destined for translocation across the cytoplasmic membraneencode a leader signal (also called signal sequence) composed of acentral hydrophobic region flanked at the N-terminus by positivelycharged residues (Pugsley, 1993). A software program, called SignalP,which identifies signal peptides and their cleavage sites based onneural networks (Nielsen et al., 1997), was used in the presentinvention to analyze the amino acid sequence of an ORF for such a signalpeptide. The first 60 N-terminal amino acids of each ORF were analyzedby SignalP using the Gram-positive software database. The outputgenerated four separate values, maximum C, maximum Y, maximum S, andmean S. The S-score, or signal region, is the probability of theposition belonging to the signal peptide. The C-score, or cleavage site,is the probability of the position being the first in the matureprotein. The Y-score is the geometric average of the C-score and asmoothed derivative of the S-score. A conclusion of either a Yes or Nois given next to each score. If all four conclusions are Yes, then a‘YES’ is listed for that ORF; if three of the conclusions are Yes, thena ‘yes’ is listed for that ORF; if two of the conclusions are Yes, thena ‘maybe’ is listed for that ORF; for all other cases, a ‘no’ is listedfor that ORF.

To predict polypeptide localization in bacteria, the software programPSORT was used (Nakai, 1991). PSORT predicts localization ofpolypeptides to the ‘cytoplasm’, ‘periplasm’, and/or ‘cytoplasmicmembrane’ for Gram-positive bacteria, as well as ‘outer membrane’ forGram-negative bacteria. Transmembrane (TM) domains of polypeptides wereanalyzed using the software program TopPred II (Cserzo et al., 1997).

The Hidden Markov Model (HMM) Pfam database (Bateman, 2000) was used toidentify Streptococcus pneumoniae proteins that may belong to anexisting protein family. Keyword searching of this output was furtherused to help identify additional candidate antigens that may have beenmissed by the BLAST search criteria.

A computer algorithm, called HMM Lipo, was developed by inventors'assignee to predict lipoproteins using approximately 131 biologicallyproven bacterial lipoproteins. The protein sequence from the start ofthe protein to the cysteine amino acid, plus the next two additionalamino acids, was used to generate the HMM (Eddy and Markov, 1996)

The inventor's assignee's also developed a HMM using approximately 70known prokaryotic proteins containing the LPXTG cell wall sortingsignal, to predict cell wall proteins that are anchored to thepeptidoglycan layer (Mazmanian et al., 1999; Navarre and Schneewind,1999). The model used not only the LPXTG sequence, but also included twofeatures of the downstream sequence, first the hydrophobic transmembranedomain and secondly, the positively charged carboxy terminus. There arealso a number of proteins that interact, non-covalently, with thepeptidoglycan layer and are distinct from the LPXTG protein classdescribed above. These proteins seem to have a consensus sequence attheir carboxy terminus (Koebnik, 1995). The inventors thereforedeveloped and used a HMM of this region to identify any Streptococcuspneumoniae that may fall into this class of proteins.

Streptococcus pneumoniae ORFs encoding surface localized, exposed, ormembrane associated polypeptides were also identified by proteomics(see, Example 3). This proteomic analysis confirmed many of theStreptococcus pneumoniae ORFs identified by the above genomic analysisand further identified novel Streptococcus pneumoniae ORFs encodingmembrane associated polypeptides.

The following Tables (i.e., Tables 1-12) represent 12 groups into whichthe ORFs identified according to the above characteristics of presentinvention have been classified. Thus, all of the groups described beloware ORFs comprised within the Streptococcus pneumoniae genome andidentified as encoding putative surface localized, exposed, membraneassociated or secreted polypeptides. These groups are not meant to limitthe scope of the present invention, as analysis of additional ORFcharacteristics also are contemplated. These additional characteristics,e.g., RGD sequence, may serve to further expand the total number of ORFgroupings or to parse the presently identified ORFs into more definedgroups, broader groups, narrower groups or group subsets. In addition,some ORFs will meet the criteria of more than one category, and willtherefore appear in more than one of the following groups.

Listed in Table 1 are ORFs that comprise a cytoplasmic membrane signalsequence (i.e., a SignalP value of YES') and have one or fewer membranespanning domains (MSD), as defined by the TopPred II program. ThirteenORFs are found that match these criteria and are considered to besurface exposed.

TABLE 1 ORFs encoding surface exposed polypeptides, SignaIP value =‘YES’ and ≦1 MSDs. SEQ ID ORF 11 190 17 403 23 469 39 790 50 935 70 114383 1475 91 1568 97 1724 128 2271 148 2621 179 3212 209 3600

Listed in Table 2 are ORFs that comprise a cytoplasmic membrane signalsequence (i.e., a SignalP value of YES') and an outer membrane (OM) orperiplasmic (Peri) prediction value when analyzed via the program Psort.Five ORFs are found that match these criteria and are considered to besurface exposed.

TABLE 2 ORFs encoding surface exposed polypeptides, a SignaIP value =‘YES’ and a Psort value of ‘OM or Peri’. SEQ ID ORF 23 469 39 790 50 935125 2228 179 3212

Listed in Table 3 are ORFs that comprise a cytoplasmic membrane signalsequence (i.e., a SignalP value of ‘YES’) and have 2 or more membranespanning domains (MSD), as defined by the TopPred II program. Twenty twoORFs are found that match these criteria and are considered to besurface exposed.

TABLE 3 ORFs encoding surface exposed polypeptides, a SignaIP = ‘YES’and ≦1 MSDs. SEQ ID ORF 11 190 13 339 17 403 23 469 34 640 39 790 50 93570 1143 73 1207 83 1475 91 1568 97 1724 106 1947 121 2196 125 2228 1262234 128 2271 148 2621 179 3212 187 3361 192 3384 209 3600

Listed in Table 4 are ORFs that comprise at least 3 of 4 SignalP values(i.e., a SignalP value of ‘yes’) and have 2 or more membrane spanningdomains (MSD), as defined by the TopPred II program. Forty-nine ORFs arefound that match these criteria and are considered to be surfaceexposed.

TABLE 4 ORFs encoding surface exposed polypeptides, a SignaIP = ‘yes’and ≧2 MSDs. SEQ ID ORF 2 72 6 94 10 141 14 356 22 462 28 597 29 598 36715 37 716 40 823 46 885 47 904 48 916 56 989 59 998 71 1178 77 1339 801412 81 1437 86 1493 87 1528 88 1530 93 1623 99 1816 101 1849 102 1863105 1904 112 2026 114 2061 115 2112 120 2195 129 2304 133 2350 140 2470145 2594 146 2613 152 2676 156 2838 168 3072 175 3141 180 3256 184 3340188 3369 190 3373 194 3386 203 3558 211 3631 213 3770 215 3799

Keyword search of the Blastp data for putative surface exposed proteinsproduced 119 ORFs and are listed in Table 5.

TABLE 5 ORFs encoding surface exposed polypeptides identified by keywordsearch of Blastp data. SEQ ID ORF 1 51 2 72 7 113 10 141 12 304 16 37820 410 24 493 27 580 30 607 31 612 32 624 33 639 34 640 35 703 38 772 40823 42 838 43 854 44 855 48 916 51 945 53 979 59 998 60 1013 61 1048 651072 67 1104 68 1117 69 1141 70 1143 71 1178 75 1244 76 1267 77 1339 781350 79 1410 80 1412 87 1528 88 1530 90 1560 94 1630 95 1632 96 1710 981765 100 1835 103 1864 105 1904 107 1966 108 1999 109 2001 112 2026 1132027 115 2112 117 2132 118 2191 122 2198 123 2201 124 2215 127 2239 1292304 131 2329 132 2348 133 2350 134 2352 135 2354 136 2385 138 2431 1392452 141 2488 144 2591 146 2613 147 2615 151 2661 152 2676 154 2734 1552814 157 2845 158 2847 159 2894 160 2969 161 2975 162 2979 163 2980 1653039 166 3040 167 3060 169 3079 172 3107 173 3115 176 3167 177 3198 1783209 180 3256 181 3262 182 3298 184 3340 185 3346 186 3349 188 3369 1893372 191 3378 193 3385 196 3457 197 3473 198 3479 199 3480 200 3487 2013493 202 3494 204 3568 205 3576 206 3578 207 3584 208 3585 210 3627 2123669 213 3770 214 3789

HMM Pfam analysis helps identify ORFs encoding proteins with domains oramino acid patterns similar to proteins that belong to an existingprotein family. Keyword search of the Pfam family classification forpotential surface exposed proteins produced 52 ORFs and are listed inTable 6.

TABLE 6 ORFs encoding surface exposed polypeptides identified by HMMPfam analysis. SEQ ID ORF 4 79 18 404 19 406 41 828 45 869 55 983 57 99258 996 63 1064 64 1070 66 1097 72 1179 74 1220 89 1559 92 1572 104 1868111 2025 116 2129 119 2193 128 2271 137 2400 142 2499 143 2543 149 2642151 2661 152 2676 153 2678 157 2845 159 2894 160 2969 162 2979 163 2980164 2983 165 3039 166 3040 169 3079 171 3083 174 3140 176 3167 180 3256182 3298 183 3327 184 3340 186 3349 188 3369 189 3372 195 3413 198 3479199 3480 205 3576 212 3669 213 3770

An algorithm called HMM Lipo was developed for use in the presentinvention. The HMM Lipo program predicts lipoproteins usingapproximately 131 biologically proven bacterial lipoproteins. HMM Lipoidentified 16 ORFs that are putative lipoproteins and are listed inTable 7.

TABLE 7 ORFs encoding surface exposed lipoproteins. SEQ ID ORF 3 75 8132 9 140 13 339 21 423 26 502 34 640 62 1059 67 1104 85 1479 134 2352147 2615 150 2655 168 3072 170 3081 173 3115

The inventors developed an HMM using approximately 70 known prokaryoticpolypeptides containing the LPXTG cell wall sorting signal. Thus, thisHMM was used to predict cell wall polypeptides that are anchored to thepeptidoglycan layer. Listed in Table 8 are 4 ORFs predicted to have theLPXTG motif and are classified as proteins that might be targeted bysortase.

TABLE 8 ORFs encoding surface exposed polypeptides anchored to thepeptidoglycan layer. SEQ ID ORF 13 339 21 423 34 640 170 3081

In addition, listed in Table 9 are 3 ORFs predicted by HMM PGB analysisto encode polypeptides potentially binding to the peptidoglycan layer ina manner independently of the sortase.

TABLE 9 ORFs encoding surface exposed polypeptides non- covalentlyanchored to the peptidoglycan layer. SEQ ID ORF 25 494 49 927 110 2012

ORFs that give a SignalP value of ‘YES’ and whose carboxy terminal aminoacid is either a Phenylalanine or Tyrosine are considered to be surfaceexposed. Listed in Table 10 are 7 ORFs matching these criteria.

TABLE 10 ORFs encoding surface exposed polypeptides, a cytoplasmicmembrane signal sequence (i.e., SignaIP = ‘YES’) and a C-terminal Phe orTyr amino acid. SEQ ID ORF 11 190 39 790 73 1207 97 1724 106 1947 1252228 187 3361

Twenty eight Streptococcus pneumoniae ORFs were additionally identifiedby proteomics as encoding membrane associated polypeptides and arelisted in Table 11. The ORFs listed in Table 11 further support theStreptococcus pneumoniae ORFs identified by the genomic miningalgorithms described above (i.e., ORFs encoding surface localized,secreted, or exposed polypeptides; Tables 1-10).

TABLE 11 Streptococcus pneumoniae ORFs confirmed by proteomics assurface exposed. SEQ ID ORF 14 356 16 378 17 403 46 885 64 1070 66 109767 1104 69 1141 71 1178 74 1220 91 1568 103 1864 116 2129 128 2271 1312329 136 2385 151 2661 156 2838 159 2894 162 2979 164 2983 172 3107 1763167 178 3209 179 3212 180 3256 182 3298 205 3576

Finally, 161 novel Streptococcus pneumoniae ORFs were identified byproteomics as encoding membrane associated polypeptides and are listedin Table 12.

TABLE 12 Streptococcus pneumoniae ORFs identified by proteomics asmembrane associated. SEQ ID ORF 431 64 432 120 433 121 434 152 435 153436 156 437 159 438 160 439 163 440 164 441 166 442 172 443 174 444 175445 178 446 180 447 181 448 183 449 186 450 188 451 189 452 192 453 194454 199 455 268 456 269 457 294 458 296 459 298 460 301 461 316 462 320463 357 464 390 465 431 466 434 467 436 468 439 469 513 470 515 471 583472 633 473 683 474 686 475 720 476 726 477 818 478 861 479 863 480 960481 1004 482 1037 483 1049 484 1054 485 1061 486 1082 487 1105 488 1111489 1175 490 1248 491 1262 492 1266 493 1312 494 1314 495 1344 496 1347497 1356 498 1417 499 1465 500 1477 501 1515 502 1527 503 1565 504 1601505 1606 506 1641 507 1770 508 1773 509 1774 510 1785 511 1803 512 1817513 1823 514 1847 515 1917 516 1923 517 1964 518 1970 519 2039 520 2041521 2047 522 2058 523 2068 524 2130 525 2251 526 2282 527 2284 528 2315529 2317 530 2318 531 2319 532 2320 533 2372 534 2374 535 2376 536 2387537 2394 538 2410 539 2425 540 2443 541 2451 542 2454 543 2508 544 2513545 2542 546 2558 547 2568 548 2575 549 2587 550 2754 551 2800 552 2839553 2892 554 2906 555 2958 556 2963 557 3021 558 3048 559 3065 560 3095561 3111 562 3125 563 3151 564 3153 565 3161 566 3178 567 3180 568 3234569 3248 570 3303 571 3331 572 3367 573 3410 574 3446 575 3454 576 3525577 3538 578 3540 579 3552 580 3555 581 3560 582 3564 583 3566 584 3632585 3653 586 3714 587 3732 588 3735 589 3739 590 3766 591 3778

As further contemplated in the present invention, Streptococcuspneumoniae ORFs are searched and evaluated for other importantcharacteristics. For example, proteins that contain the Arg-Gly-Asp(RGD) attachment motif, together with integrins that serve as theirreceptor, constitute a major recognition system for cell adhesion, andthus are putative Streptococcus pneumoniae polypeptide antigens. FourStreptococcus pneumoniae ORFs, i.e., ORF 51, ORF 423, ORF 1097 and ORF1104, have been identified as having a tripeptide RGD sequence thatpotentially is involved in cell attachment.

ORFs RGD recognition is one mechanism used by microbes to gain entryinto eukaryotic tissues (Stockbauer et al., 1999; Isberg and Nhieu,1994). However, not all

RGD-containing proteins mediate cell attachment. It has been shown thatRGD-containing peptides with a proline at the carboxy end (RGDP) areinactive in cell attachment assays (Pierschbacher and Rouslahti, 1987)and are excluded. A tandem repeat finder (Benson, 1999) may also beused, as has been used to identify ORFs containing repeated DNAsequences such as those found in MSCRAMMs (Foster and Hook, 1998) andphase variable surface proteins of Neisseria meningitidis (Parkhill etal., 2000).

The present inventors also have used the Geanfammer software to clusterproteins into homologous families (Park and Teichmann, 1998).Preliminary analysis of the family classes has provided novel ORFswithin a vaccine candidate cluster as well as defining potential proteinfunction.

The ORFs listed in Table 13, were identified by analysis of theStreptococcus pneumoniae genome. A total of 215 ORFs were identifiedbased on the analysis criteria described above and listed in Tables1-10. The 215 ORFs identified are listed vertically in Table 13 (column1). The nucleotide SEQ ID NOS: 1 through SEQ ID NOS: 215 (column 2) andthe encoded polypeptide SEQ ID NOS: 216 through SEQ ID NOS: 430 (column3) are listed horizontally to their respective ORF. For example, inTable 13, ORF 51 has the nucleotide sequence of SEQ ID NO:1 and theencoded polypeptide has the amino acid sequence of SEQ ID NO: 216, ORF72 has nucleotide SEQ ID NO:2 and encoded polypeptide SEQ ID NO: 217,etc.

Proteomic analysis identified twenty eight ORFs (see, Table 11) alreadylisted in Table 13 (e.g., SEQ ID NO: 14, SEQ ID NO:16, SEQ ID NO:27,etc.) Proteomic analysis further identified 161 novel ORFs encodingmembrane associated proteins (see, Table 12). These 161 novel ORFsidentified by proteomics as membrane associated are listed vertically inTable 14 (column 1). The nucleotide SEQ ID NOS: 431 through SEQ ID NO:591 (column 2) and the encoded polypeptide SEQ ID NOS: 592 through 752(column 3) are listed horizontally to their respective ORF.

TABLE 13 Streptococcus Pneumoniae open reading frames (ORFs) NucleotidePolypeptide ORF SEQ ID NO SEQ ID NO 51 1 216 72 2 217 75 3 218 79 4 21986 5 220 94 6 221 113 7 222 132 8 223 140 9 224 141 10 225 190 11 226304 12 227 339 13 228 356 14 229 370 15 230 378 16 231 403 17 232 404 18233 406 19 234 410 20 235 423 21 236 462 22 237 469 23 238 493 24 239494 25 240 502 26 241 580 27 242 597 28 243 598 29 244 607 30 245 612 31246 624 32 247 639 33 248 640 34 249 703 35 250 715 36 251 716 37 252772 38 253 790 39 254 823 40 255 823 40 255 828 41 256 838 42 257 854 43258 855 44 259 869 45 260 885 46 261 904 47 262 916 48 263 927 49 264935 50 265 945 51 266 965 52 267 979 53 268 980 54 269 983 55 270 989 56271 992 57 272 996 58 273 998 59 274 1013 60 275 1048 61 276 1059 62 2771064 63 278 1070 64 279 1072 65 280 1097 66 281 1104 67 282 1117 68 2831141 69 284 1143 70 285 1178 71 286 1179 72 287 1207 73 288 1220 74 2891244 75 290 1267 76 291 1339 77 292 1350 78 293 1410 79 294 1412 80 2951437 81 296 1459 82 297 1475 83 298 1476 84 299 1479 85 300 1493 86 3011528 87 302 1530 88 303 1559 89 304 1560 90 305 1568 91 306 1572 92 3071623 93 308 1630 94 309 1632 95 310 1710 96 311 1724 97 312 1765 98 3131816 99 314 1835 100 315 1849 101 316 1863 102 317 1864 103 318 1868 104319 1904 105 320 1947 106 321 1966 107 322 1999 108 323 2001 109 3242012 110 325 2025 111 326 2026 112 327 2027 113 328 2061 114 329 2112115 330 2129 116 331 2132 117 332 2191 118 333 2193 119 334 2195 120 3352196 121 336 2198 122 337 2201 123 338 2215 124 339 2228 125 340 2234126 341 2239 127 342 2271 128 343 2304 129 344 2322 130 345 2329 131 3462348 132 347 2350 133 348 2352 134 349 2354 135 350 2385 136 351 2400137 352 2431 138 353 2452 139 354 2470 140 355 2488 141 356 2499 142 3572543 143 358 2591 144 359 2594 145 360 2613 146 361 2615 147 362 2621148 363 2642 149 364 2655 150 365 2661 151 366 2676 152 367 2678 153 3682734 154 369 2814 155 370 2838 156 371 2845 157 372 2847 158 373 2894159 374 2969 160 375 2975 161 376 2979 162 377 2980 163 378 2983 164 3793039 165 380 3040 166 381 3060 167 382 3072 168 383 3079 169 384 3081170 385 3083 171 386 3107 172 387 3115 173 388 3140 174 389 3141 175 3903167 176 391 3198 177 392 3209 178 393 3212 179 394 3256 180 395 3262181 396 3298 182 397 3327 183 398 3340 184 399 3346 185 400 3349 186 4013361 187 402 3369 188 403 3372 189 404 3373 190 405 3378 191 406 3384192 407 3385 193 408 3386 194 409 3413 195 410 3457 196 411 3473 197 4123479 198 413 3480 199 414 3487 200 415 3493 201 416 3494 202 417 3558203 418 3568 204 419 3576 205 420 3578 206 421 3584 207 422 3585 208 4233600 209 424 3627 210 425 3631 211 426 3669 212 427 3770 213 428 3789214 429 3799 215 430

TABLE 14 Streptococcus Pneumoniae open reading frames (ORFs) NucleotidePolypeptide ORF SEQ ID NO SEQ ID NO 64 431 592 120 432 593 121 433 594152 434 595 153 435 596 156 436 597 159 437 598 160 438 599 163 439 600164 440 601 166 441 602 172 442 603 174 443 604 175 444 605 178 445 606180 446 607 181 447 608 183 448 609 186 449 610 188 450 611 189 451 612192 452 613 194 453 614 199 454 615 268 455 616 269 456 617 294 457 618296 458 619 298 459 620 301 460 621 316 461 622 320 462 623 357 463 624390 464 625 431 465 626 434 466 627 436 467 628 439 468 629 513 469 630515 470 631 583 471 632 633 472 633 683 473 634 686 474 635 720 475 636726 476 637 818 477 638 861 478 639 863 479 640 960 480 641 1004 481 6421037 482 643 1049 483 644 1054 484 645 1061 485 646 1082 486 647 1105487 648 1111 488 649 1175 489 650 1248 490 651 1262 491 652 1266 492 6531312 493 654 1314 494 655 1344 495 656 1347 496 657 1356 497 658 1417498 659 1465 499 660 1477 500 661 1515 501 662 1527 502 663 1565 503 6641601 504 665 1606 505 666 1641 506 667 1770 507 668 1773 508 669 1774509 670 1785 510 671 1803 511 672 1817 512 673 1823 513 674 1847 514 6751917 515 676 1923 516 677 1964 517 678 1970 518 679 2039 519 680 2041520 681 2047 521 682 2058 522 683 2068 523 684 2130 524 685 2251 525 6862282 526 687 2284 527 688 2315 528 689 2317 529 690 2318 530 691 2319531 692 2320 532 693 2372 533 694 2374 534 695 2376 535 696 2387 536 6972394 537 698 2410 538 699 2425 539 700 2443 540 701 2451 541 702 2454542 703 2508 543 704 2513 544 705 2542 545 706 2558 546 707 2568 547 7082575 548 709 2587 549 710 2754 550 711 2800 551 712 2839 552 713 2892553 714 2906 554 715 2958 555 716 2963 556 717 3021 557 718 3048 558 7193065 559 720 3095 560 721 3111 561 722 3125 562 723 3151 563 724 3153564 725 3161 565 726 3178 566 727 3180 567 728 3234 568 729 3248 569 7303303 570 731 3331 571 732 3367 572 733 3410 573 734 3446 574 735 3454575 736 3525 576 737 3538 577 738 3540 578 739 3552 579 740 3555 580 7413560 581 742 3564 582 743 3566 583 744 3632 584 745 3653 585 746 3714586 747 3732 587 748 3735 588 749 3739 589 750 3766 590 751 3778 591 752

B. Streptococcus pneumoniae ORF Polynucleotides Encoding Surface ExposedPolypeptides

Isolated and purified Streptococcus pneumoniae ORF polynucleotides ofthe present invention are contemplated for use in the production ofStreptococcus pneumoniae polypeptides. More specifically, in certainembodiments, the ORFs encode Streptococcus pneumoniae surface localized,exposed, membrane associated or secreted polypeptides, particularlyantigenic polypeptides. Thus, in one aspect, the present inventionprovides isolated and purified polynucleotides (ORFs) that encodeStreptococcus pneumoniae surface localized, exposed, membrane associatedor secreted polypeptides. In particular embodiments, a polynucleotide ofthe present invention is a DNA molecule, wherein the DNA may be genomicDNA, chromosomal DNA, plasmid DNA or cDNA. In a preferred embodiment, apolynucleotide of the present invention is a recombinant polynucleotide,which encodes a Streptococcus pneumoniae polypeptide comprising an aminoacid sequence that has at least 95% identity to an amino acid sequenceof one of SEQ ID NO: 216 through SEQ ID NO: 430 or SEQ ID NO: 592through SEQ ID NO: 752, or a fragment thereof. In another embodiment, anisolated and purified ORF polynucleotide comprises a nucleotide sequencethat has at least 95% identity to one of the ORF nucleotide sequences ofSEQ ID NO: 1 through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO:591, a degenerate variant thereof, or a complement thereof. In apreferred embodiment, an ORF polynucleotide of one of SEQ ID NO: 1through SEQ ID NO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591 iscomprised in a plasmid vector and expressed in a prokaryotic host cell.

As used hereinafter, the term “polynucleotide” means a sequence ofnucleotides connected by phosphodiester linkages. Polynucleotides arepresented hereinafter in the direction from the 5′ to the 3′ direction.A polynucleotide of the present invention can comprise from about 10 toabout several hundred thousand base pairs. Preferably, a polynucleotidecomprises from about 10 to about 3,000 base pairs. Preferred lengths ofparticular polynucleotide are set forth hereinafter.

A polynucleotide of the present invention can be a deoxyribonucleic acid(DNA) molecule, a ribonucleic acid (RNA) molecule, or analogs of the DNAor RNA generated using nucleotide analogs. The nucleic acid molecule canbe single-stranded or double-stranded, but preferably is double-strandedDNA. Where a polynucleotide is a DNA molecule, that molecule can be agene, a cDNA molecule or a genomic DNA molecule. Nucleotide bases areindicated hereinafter by a single letter code: adenine (A), guanine (G),thymine (T), cytosine (C), inosine (I) and uracil (U).

“Isolated” means altered “by the hand of man” from the natural state. Ifan “isolated” composition or substance occurs in nature, it has beenchanged or removed from its original environment, or both. For example,a polynucleotide or a polypeptide naturally present in a living animalis not “isolated,” but the same polynucleotide or polypeptide separatedfrom the coexisting materials of its natural state is “isolated,” as theterm is employed hereinafter.

Preferably, an “isolated” polynucleotide is free of sequences whichnaturally flank the nucleic acid (i.e., sequences located at the 5′ and3′ ends of the nucleic acid) in the genomic DNA of the organism fromwhich the nucleic acid is derived. For example, in various embodiments,the isolated Streptococcus pneumoniae nucleic acid molecule can containless than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb ofnucleotide sequences which naturally flank the nucleic acid molecule ingenomic DNA of the cell from which the nucleic acid is derived. However,the Streptococcus pneumoniae nucleic acid molecule can be fused to otherprotein encoding or regulatory sequences and still be consideredisolated.

ORF polynucleotides of the present invention may be obtained, usingstandard cloning and screening techniques, from a cDNA library derivedfrom mRNA. Polynucleotides of the invention can also be obtained fromnatural sources such as genomic DNA libraries (e.g., a Streptococcuspneumoniae library) or can be synthesized using well known andcommercially available techniques. Contemplated in the presentinvention, ORF polynucleotides will be obtained using Streptococcuspneumoniae type 3, type 14 or type 19F chromosomal DNA as the template.

The invention further encompasses nucleic acid molecules that differfrom the nucleotide sequences shown in SEQ ID NO:1 through SEQ ID NO:215or SEQ ID NO: 431 through SEQ ID NO: 591 (and fragments thereof) due todegeneracy of the genetic code and thus encode the same Streptococcuspneumoniae polypeptide as that encoded by the nucleotide sequence shownSEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO:591.

Orthologues and allelic variants of the Streptococcus pneumoniaepolynucleotides can readily be identified using methods well known inthe art. Allelic variants and orthologues of the polynucleotides willcomprise a nucleotide sequence that is typically at least about 70-75%,more typically at least about 80-85%, and most typically at least about90-95% or more homologous to the nucleotide sequence shown in SEQ IDNO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, ora fragment of these nucleotide sequences. Such nucleic acid moleculescan readily be identified as being able to hybridize, preferably understringent conditions, to the nucleotide sequence shown in SEQ ID NO:1through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, or afragment of these nucleotide sequences.

Moreover, the polynucleotide of the invention can comprise only afragment of the coding region of a Streptococcus pneumoniaepolynucleotide or gene, such as a fragment of one of SEQ ID NO:1 throughSEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591. Preferably, suchfragments are immunogenic fragments.

When the ORF polynucleotides of the invention are used for therecombinant production of Streptococcus pneumoniae polypeptides of thepresent invention, the polynucleotide may include the coding sequencefor the mature polypeptide, by itself, or the coding sequence for themature polypeptide in reading frame with other coding sequences, such asthose encoding a leader or secretory sequence, a pre-, or pro- orprepro-protein sequence, or other fusion peptide portions. For example,a marker sequence which facilitates purification of the fusedpolypeptide can be linked to the coding sequence (see Gentz et al.,1989, incorporated by reference hereinafter in its entirety). Thus,contemplated in the present invention is the preparation ofpolynucleotides encoding fusion polypeptides permitting His-tagpurification of expression products. The polynucleotide may also containnon-coding 5′ and 3′ sequences, such as transcribed, non-translatedsequences, splicing and polyadenylation signals.

Thus, a polynucleotide encoding a polypeptide of the present invention,including homologs and orthologs from species other than Streptococcuspneumoniae, may be obtained by a process which comprises the steps ofscreening an appropriate library under stringent hybridizationconditions with a labeled probe having the sequence of one of SEQ IDNO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, afragment thereof; and isolating full-length cDNA and genomic clonescontaining the polynucleotide sequence. Such hybridization techniquesare well known to the skilled artisan. The skilled artisan willappreciate that, in many cases, an isolated cDNA sequence will beincomplete, in that the region coding for the polypeptide is cut shortat the 5′ end of the cDNA. This is a consequence of reversetranscriptase, an enzyme with inherently low “processivity” (a measureof the ability of the enzyme to remain attached to the template duringthe polymerization reaction), failing to complete a DNA copy of the mRNAtemplate during 1st strand cDNA synthesis.

Thus, in certain embodiments, the polynucleotide sequence informationprovided by the present invention allows for the preparation ofrelatively short DNA (or RNA) oligonucleotide sequences having theability to specifically hybridize to gene sequences of the selectedpolynucleotides disclosed hereinafter. The term “oligonucleotide” asused hereinafter is defined as a molecule comprised of two or moredeoxyribonucleotides or ribonucleotides, usually more than three (3),and typically more than ten (10) and up to one hundred (100) or more(although preferably between twenty and thirty). The exact size willdepend on many factors, which in turn depends on the ultimate functionor use of the oligonucleotide. Thus, in particular embodiments of theinvention, nucleic acid probes of an appropriate length are preparedbased on a consideration of a selected nucleotide sequence, e.g., asequence such as that shown in SEQ ID NO:1 through SEQ ID NO:215 or SEQID NO: 431 through SEQ ID NO: 591. The ability of such nucleic acidprobes to specifically hybridize to a polynucleotide encoding aStreptococcus pneumoniae polypeptide lends them particular utility in avariety of embodiments. Most importantly, the probes can be used in avariety of assays for detecting the presence of complementary sequencesin a given sample.

In certain embodiments, it is advantageous to use oligonucleotideprimers. These primers may be generated in any manner, includingchemical synthesis, DNA replication, reverse transcription, or acombination thereof. The sequence of such primers is designed using apolynucleotide of the present invention for use in detecting, amplifyingor mutating a defined segment of an ORF polynucleotide that encodes aStreptococcus pneumoniae polypeptide from prokaryotic cells usingpolymerase chain reaction (PCR) technology.

In certain embodiments, it is advantageous to employ a polynucleotide ofthe present invention in combination with an appropriate label fordetecting hybrid formation. A wide variety of appropriate labels areknown in the art, including radioactive, enzymatic or other ligands,such as avidin/biotin, which are capable of giving a detectable signal.

Polynucleotides which are identical or sufficiently identical to anucleotide sequence contained in one of SEQ ID NO:1 through SEQ IDNO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, or a fragment thereof,may be used as hybridization probes for cDNA and genomnic DNA or asprimers for a nucleic acid amplification (PCR) reaction, to isolatefull-length cDNAs and genomic clones encoding polypeptides of thepresent invention and to isolate cDNA and genomic clones of other genes(including genes encoding homologs and orthologs from species other thanStreptococcus pneumoniae) that have a high sequence similarity to thepolynucleotide sequences set forth in of SEQ ID NO:1 through SEQ IDNO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, or a fragment thereof.Typically these nucleotide sequences are from at least about 70%identical to at least about 95% identical to that of the referencepolynucleotide sequence. The probes or primers will generally compriseat least 15 nucleotides, preferably, at least 30 nucleotides and mayhave at least 50 nucleotides. Particularly preferred probes will havebetween 30 and 50 nucleotides.

There are several methods available and well known to those skilled inthe art to obtain full-length cDNAs, or extend short cDNAs, for examplethose based on the method of Rapid Amplification of cDNA ends (RACE)(see, Frohman et al., 1988). Recent modifications of the technique,exemplified by the Marathon™ technology (Clontech Laboratories Inc.) forexample, have significantly simplified the search for longer cDNAs. Inthe Marathon™ technology, cDNAs have been prepared from mRNA extractedfrom a chosen tissue and an “adaptor” sequence ligated onto each end.Nucleic acid amplification (PCR) is then carried out to amplify the“missing” 5′ end of the cDNA using a combination of gene specific andadaptor specific oligonucleotide primers. The PCR reaction is thenrepeated using “nested” primers, that is, primers designed to annealwithin the amplified product (typically an adaptor specific primer thatanneals further 3′ in the adaptor sequence and a gene specific primerthat anneals further 5′ in the known gene sequence). The products ofthis reaction can then be analyzed by DNA sequencing and a full-lengthcDNA constructed either by joining the product directly to the existingcDNA to give a complete sequence, or carrying out a separate full-lengthPCR using the new sequence information for the design of the 5′ primer.

To provide certain of the advantages in accordance with the presentinvention, a preferred nucleic acid sequence employed for hybridizationstudies or assays includes probe molecules that are complementary to atleast a 10 to about 70 nucleotides long stretch of a polynucleotide thatencodes a Streptococcus pneumoniae polypeptide, such as that shown inone of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQID NO: 752. A size of at least 10 nucleotides in length helps to ensurethat the fragment will be of sufficient length to form a duplex moleculethat is both stable and selective. Molecules having complementarysequences over stretches greater than 10 bases in length are generallypreferred, though, in order to increase stability and selectivity of thehybrid, and thereby improve the quality and degree of specific hybridmolecules obtained. One will generally prefer to design nucleic acidmolecules having gene-complementary stretches of 25 to 40 nucleotides,55 to 70 nucleotides, or even longer where desired. Such fragments canbe readily prepared by, for example, directly synthesizing the fragmentby chemical means, by application of nucleic acid reproductiontechnology, such as the PCR technology of (U.S. Pat. No. 4,683,202,incorporated hereinafter by reference) or by excising selected DNAfragments from recombinant plasmids containing appropriate inserts andsuitable restriction enzyme sites.

In another aspect, the present invention contemplates an isolated andpurified polynucleotide comprising a nucleotide sequence that isidentical or complementary to a segment of at least 10 contiguous basesof one of SEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO: 431 throughSEQ ID NO: 591, wherein the polynucleotide hybridizes to apolynucleotide that encodes a Streptococcus pneumoniae polypeptide.Preferably, the isolated and purified polynucleotide comprises a basesequence that is identical or complementary to a segment of at least 25to about 70 contiguous bases of one of SEQ ID NO:1 through SEQ ID NO:215or SEQ ID NO: 431 through SEQ ID NO: 591. For example, thepolynucleotide of the invention can comprise a segment of basesidentical or complementary to 40 or 55 contiguous bases of the disclosednucleotide sequences.

Accordingly, a polynucleotide probe molecule of the invention can beused for its ability to selectively form duplex molecules withcomplementary stretches of the gene. Depending on the applicationenvisioned, one will desire to employ varying conditions ofhybridization to achieve varying degree of selectivity of the probetoward the target sequence (see Table 15 below). For applicationsrequiring a high degree of selectivity, one will typically desire toemploy relatively stringent conditions to form the hybrids. Of course,for some applications, for example, where one desires to prepare mutantsemploying a mutant primer strand hybridized to an underlying template orwhere one seeks to isolate a Streptococcus pneumoniae homologouspolypeptide coding sequence from other cells, functional equivalents, orthe like, less stringent hybridization conditions are typically neededto allow formation of the heteroduplex (see Table 15). Cross-hybridizingspecies can thereby be readily identified as positively hybridizingsignals with respect to control hybridizations. Thus, hybridizationconditions are readily manipulated, and thus will generally be a methodof choice depending on the desired results.

Of course, for some applications, for example, where one desires toprepare mutants employing a mutant primer strand hybridized to anunderlying template or where one seeks to isolate a homologouspolypeptide coding sequence from other cells, functional equivalents, orthe like, less stringent hybridization conditions are typically neededto allow formation of the heteroduplex. Cross-hybridizing species arethereby readily identified as positively hybridizing signals withrespect to control hybridizations. In any case, it is generallyappreciated that conditions can be rendered more stringent by theaddition of increasing amounts of formamide, which serves to destabilizethe hybrid duplex in the same manner as increased temperature. Thus,hybridization conditions are readily manipulated, and thus willgenerally be a method of choice depending on the desired results.

The present invention also includes polynucleotides capable ofhybridizing under reduced stringency conditions, more preferablystringent conditions, and most preferably highly stringent conditions,to polynucleotides described hereinafter. Examples of stringencyconditions are shown in the table below: highly stringent conditions arethose that are at least as stringent as, for example, conditions A-F;stringent conditions are at least as stringent as, for example,conditions G-L; and reduced stringency conditions are at least asstringent as, for example, conditions M-R.

TABLE 15 Stringency Conditions Poly- Hybrid Hybridization WashStringency nucleotide Length Temperature and Temperature ConditionHybrid (bp)^(I) Buffer^(H) and Buffer^(H) A DNA:DNA >50 65° C.; 1xSSC-or- 65° C.; 42° C.; 1xSSC, 0.3xSSC 50% formamide B DNA:DNA <50 T_(B);1xSSC T_(B); 1xSSC C DNA:RNA >50 67° C.; 1xSSC -or- 67° C.; 45° C.;1xSSC, 0.3xSSC 50% formamide D DNA:RNA <50 T_(D); 1xSSC T_(D); 1xSSC ERNA:RNA >50 70° C.; 1xSSC -or- 70° C.; 50° C.; 1xSSC, 0.3xSSC 50%formamide F RNA:RNA <50 T_(F); 1xSSC T_(F); 1xSSC G DNA:DNA >50 65° C.;4xSSC -or- 65° C.; 42° C.; 4xSSC, 1xSSC 50% formamide H DNA:DNA <50T_(H); 4xSSC T_(H); 4xSSC I DNA:RNA >50 67° C.; 4xSSC -or- 67° C.; 45°C.; 4xSSC, 1xSSC 50% formamide J DNA:RNA <50 T_(J); 4xSSC T_(J); 4xSSC KRNA:RNA >50 70° C.; 4xSSC -or- 67° C.; 50° C.; 4xSSC, 1xSSC 50%formamide L RNA:RNA <50 T_(L); 2xSSC T_(L); 2xSSC M DNA:DNA >50 50° C.;4xSSC -or- 50° C.; 40° C.; 6xSSC, 2xSSC 50% formamide N DNA:DNA <50T_(N); 6xSSC T_(N); 6xSSC O DNA:RNA >50 55° C.; 4xSSC -or- 55° C.; 42°C.; 6xSSC, 2xSSC 50% formamide P DNA:RNA <50 T_(P); 6xSSC T_(P); 6xSSC QRNA:RNA >50 60° C.; 4xSSC -or- 60° C.; 45° C.; 6xSSC, 2xSSC 50%formamide R RNA:RNA <50 T_(R); 4xSSC T_(R); 4xSSC (bp)^(I): The hybridlength is that anticipated for the hybridized region(s) of thehybridizing polynucleotides. When hybridizing a polynucleotide to atarget polynucleotide of unknown sequence, the hybrid length is assumedto be that of the hybridizing polynucleotide. When polynucleotides ofknown sequence are hybridized, the hybrid length can be determined byaligning the sequences of the polynucleotides and identifying the regionor regions of optimal sequence complementarity. Buffer^(H): SSPE (1xSSPEis 0.15M NaCl, 10 mM NaH₂PO₄, and 1.25 mM EDTA, pH 7.4) can besubstituted for SSC (1xSSC is 0.15M NaCl and 15 mM sodium citrate) inthe hybridization and wash buffers; washes are performed for 15 minutesafter hybridization is complete. T_(B) through T_(R): The hybridizationtemperature for hybrids anticipated to be less than 50 base pairs inlength should be 5-10° C. less than the melting temperature (T_(m)) ofthe hybrid, where T_(m) is determined according to the followingequations. For hybrids less than 18 base pairs in length, T_(m)(° C.) =2(# of A + T bases) + 4(# of G + C bases). For hybrids between 18 and 49base pairs in length, T_(m)(° C.) = 81.5 + 16.6(log₁₀[Na⁺]) + 0.41(% G +C) − (600/N), where N is the number of bases in the hybrid, and [Na⁺] isthe concentration of sodium ions in the hybridization buffer ([Na⁺] for1xSSC = 0.165M).

Additional examples of stringency conditions for polynucleotidehybridization are provided in Sambrook et al., 1989, Molecular Cloning:A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N.Y., chapters 9 and 11, and Ausubel et al., 1995, CurrentProtocols in Molecular Biology, eds., John Wiley & Sons, Inc., sections2.10 and 6.3-6.4, incorporated hereinafter by reference.

In addition to the nucleic acid molecules encoding Streptococcuspneumoniae polypeptides described above, another aspect of the inventionpertains to isolated nucleic acid molecules which are antisense thereto.An “antisense” nucleic acid comprises a nucleotide sequence which iscomplementary to a “sense” nucleic acid encoding a protein, e.g.,complementary to the coding strand of a double-stranded cDNA molecule orcomplementary to an mRNA sequence. Accordingly, an antisense nucleicacid can hydrogen bond to a sense nucleic acid. The antisense nucleicacid can be complementary to an entire Streptococcus pneumoniae codingstrand, or to only a fragment thereof. In one embodiment, an antisensenucleic acid molecule is antisense to a “coding region” of the codingstrand of a nucleotide sequence encoding a Streptococcus pneumoniaepolypeptide.

The term “coding region” refers to the region of the nucleotide sequencecomprising codons which are translated into amino acid residues, e.g.,the entire coding region of one of SEQ ID NO:1 through SEQ ID NO:215 orSEQ ID NO: 431 through SEQ ID NO: 591. In another embodiment, theantisense nucleic acid molecule is antisense to a “noncoding region” ofthe coding strand of a nucleotide sequence encoding a Streptococcuspneumoniae polypeptide. The term “noncoding region” refers to 5′ and 3′sequences that flank the coding region that are not translated intoamino acids (i.e., also referred to as 5′ and 3′ untranslated regions).

Given the coding strand sequence encoding the Streptococcus pneumoniaepolypeptide disclosed hereinafter (e.g., one of SEQ ID NO:1 through SEQID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 571), antisense nucleicacids of the invention can be designed according to the rules of Watsonand Crick base pairing. The antisense nucleic acid molecule can becomplementary to the entire coding region of Streptococcus pneumoniaemRNA, but more preferably is an oligonucleotide which is antisense toonly a fragment of the coding or noncoding region of Streptococcuspneumoniae mRNA. For example, the antisense oligonucleotide can becomplementary to the region surrounding the translation start site ofStreptococcus pneumoniae mRNA.

An antisense oligonucleotide can be, for example, about 5, 10, 15, 20,25, 30, 35, 40, 45 or 50 nucleotides in length. An antisense nucleicacid of the invention can be constructed using chemical synthesis andenzymatic ligation reactions using procedures known in the art. Forexample, an antisense nucleic acid (e.g., an antisense oligonucleotide)can be chemically synthesized using naturally occurring nucleotides orvariously modified nucleotides designed to increase the biologicalstability of the molecules or to increase the physical stability of theduplex formed between the antisense and sense nucleic acids, e.g.,phosphorothioate derivatives and acridine substituted nucleotides can beused. Examples of modified nucleotides which can be used to generate theantisense nucleic acid include 5-fluorouracil, 5-bromouracil,5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine,5-(carboxyhydroxylmethyl)uracil,5-carboxymethylaminomethyl-2-thiouridine,5-carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6-isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-N-6-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v),5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl)uracil, (acp3)w,and 2,6-diaminopurine.

Alternatively, the antisense nucleic acid can be produced biologicallyusing an expression vector into which a nucleic acid has been subclonedin an antisense orientation (i.e., RNA transcribed from the insertednucleic acid will be of an antisense orientation to a target nucleicacid of interest, described further in the following subsection).

The antisense nucleic acid molecules of the invention are typicallyadministered to a subject or generated in situ such that they hybridizewith or bind to cellular mRNA and/or genomic DNA encoding aStreptococcus pneumoniae polypeptide to thereby inhibit expression ofthe polypeptide, e.g., by inhibiting transcription and/or translation.The hybridization can be by conventional nucleotide complementarity toform a stable duplex, or, for example, in the case of an antisensenucleic acid molecule which binds to DNA duplexes, through specificinteractions in the major groove of the double helix. An example of aroute of administration of an antisense nucleic acid molecule of theinvention includes direct injection at a tissue site. Alternatively, anantisense nucleic acid molecule can be modified to target selected cellsand then administered systemically. For example, for systemicadministration, an antisense molecule can be modified such that itspecifically binds to a receptor or an antigen expressed on a selectedcell surface, e.g., by linking the antisense nucleic acid molecule to apeptide or an antibody which binds to a cell surface receptor orantigen. The antisense nucleic acid molecule can also be delivered tocells using the vectors described hereinafter.

In yet another embodiment, the antisense nucleic acid molecule of theinvention is an α-anomeric nucleic acid molecule. An α-anomeric nucleicacid molecule forms specific double-stranded hybrids with complementaryRNA in which, contrary to the usual γ-units, the strands run parallel toeach other (Gaultier et al., 1987). The antisense nucleic acid moleculecan also comprise a 2′-o-methylribonucleotide (Inoue et al., 1987 (a))or a chimeric RNA-DNA analogue (Inoue et al., 1987(b)).

In still another embodiment, an antisense nucleic acid of the inventionis a ribozyme. Ribozymes are catalytic RNA molecules with ribonucleaseactivity which are capable of cleaving a single-stranded nucleic acid,such as an mRNA, to which they have a complementary region. Thus,ribozymes (e.g., hammerhead ribozymes (described in Haselhoff andGerlach, 1988)) can be used to catalytically cleave Streptococcuspneumoniae mRNA transcripts to thereby inhibit translation ofStreptococcus pneumoniae mRNA. A ribozyme having specificity for aStreptococcus pneumoniae-encoding nucleic acid can be designed basedupon the nucleotide sequence of a Streptococcus pneumoniae cDNAdisclosed hereinafter (i.e., SEQ ID NO:I through SEQ ID NO:215 or SEQ IDNO: 431 through SEQ ID NO: 591). For example, a derivative of aTetrahymena L-19 IVS RNA can be constructed in which the nucleotidesequence of the active site is complementary to the nucleotide sequenceto be cleaved in a Streptococcus pneumoniae-encoding mRNA. See, e.g.,Cech et al. U.S. Pat. No. 4,987,071 and Cech et al. U.S. Pat. No.5,116,742 both incorporated by reference. Alternatively, Streptococcuspneumoniae mRNA can be used to select a catalytic RNA having a specificribonuclease activity from a pool of RNA molecules. See, e.g., Barteland Szostak, 1993.

Alternatively Streptococcus pneumoniae gene expression can be inhibitedby targeting nucleotide sequences complementary to the regulatory regionof the Streptococcus pneumoniae gene (e.g., the Streptococcus pneumoniaegene promoter and/or enhancers) to form triple helical structures thatprevent transcription of the Streptococcus pneumoniae gene in targetcells. See generally, Helene, 1991; Helene et al., 1992; and Maher,1992.

Streptococcus pneumoniae gene expression can also be inhibited using RNAinterference (RNAi). This is a technique for post-transcriptional genesilencing (PTGS), in which target gene activity is specificallyabolished with cognate double-stranded RNA (dsRNA). RNAi resembles inmany aspects PTGS in plants and has been detected in many invertebratesincluding trypanosome, hydra, planaria, nematode and fruit fly(Drosophila melangnoster). It may be involved in the modulation oftransposable element mobilization and antiviral state formation. RNAi inmammalian systems is disclosed in International Application WO 00/63364which is incorporated by reference hereinafter in its entirety.Basically, dsRNA of at least about 600 nucleotides, homologous to thetarget is introduced into the cell and a sequence specific reduction ingene activity is observed.

C. Streptococcus pneumoniae Polypeptides

In particular embodiments, the present invention provides isolated andpurified Streptococcus pneumoniae polypeptides. Preferably, aStreptococcus pneumoniae polypeptide of the invention is a recombinantpolypeptide. In certain embodiments, a Streptococcus pneumoniaepolypeptide of the present invention comprises the amino acid sequencethat has at least 95% identity to the amino acid sequence of one of SEQID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752a biological equivalent thereof, or a fragment thereof.

A Streptococcus pneumoniae polypeptide according to the presentinvention encompasses a polypeptide that comprises: 1) the amino acidsequence shown in one of SEQ ID NO:216 through SEQ ID NO:430 or SEQ IDNO: 592 or SEQ ID NO: 752; 2) functional and non-functional naturallyoccurring variants or biological equivalents of Streptococcus pneumoniaepolypeptides of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592through 752; 3) recombinantly produced variants or biologicalequivalents of Streptococcus pneumoniae polypeptides of SEQ ID NO:216through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752; and 4)polypeptides isolated from organisms other than Streptococcus pneumoniae(orthologues of Streptococcus pneumoniae polypeptides.)

A biological equivalent or variant of a Streptococcus pneumoniaepolypeptide according to the present invention encompasses 1) apolypeptide isolated from Streptococcus pneumoniae; and 2) a polypeptidethat contains substantially homology to a Streptococcus pneumoniaepolypeptide.

Biological equivalents or variants of Streptococcus pneumoniae includeboth functional and non-functional Streptococcus pneumoniaepolypeptides. Functional biological equivalents or variants arenaturally occurring amino acid sequence variants of a Streptococcuspneumoniae polypeptide that maintains the ability to elicit animmunological or antigenic response in a subject. Functional variantswill typically contain only conservative substitution of one or moreamino acids of one of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO:592 through SEQ ID NO: 752, or substitution, deletion or insertion ofnon-critical residues in non-critical regions of the polypeptide (e.g.,not in regions containing antigenic determinants or protectiveepitopes).

The present invention further provides non-Streptococcus pneumoniaeorthologues of Streptococcus pneumoniae polypeptides. Orthologues ofStreptococcus pneumoniae polypeptides are polypeptides that are isolatedfrom non-Streptococcus pneumoniae organisms and possess antigeniccapabilities of the Streptococcus pneumoniae polypeptide. Orthologues ofa Streptococcus pneumoniae polypeptide can readily be identified ascomprising an amino acid sequence that is substantially homologous toone of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQID NO: 752.

Modifications and changes can be made in the structure of a polypeptideof the present invention and still obtain a molecule havingStreptococcus pneumoniae antigenicity. For example, certain amino acidscan be substituted for other amino acids in a sequence withoutappreciable loss of antigenicity. Because it is the interactive capacityand nature of a polypeptide that defines that polypeptide's biologicalfunctional activity, certain amino acid sequence substitutions can bemade in a polypeptide sequence (or, of course, its underlying DNA codingsequence) and nevertheless obtain a polypeptide with like properties.

In making such changes, the hydropathic index of amino acids can beconsidered. The importance of the hydropathic amino acid index inconferring interactive biologic function on a polypeptide is generallyunderstood in the art (Kyte & Doolittle, 1982). It is known that certainamino acids can be substituted for other amino acids having a similarhydropathic index or score and still result in a polypeptide withsimilar biological activity. Each amino acid has been assigned ahydropathic index on the basis of its hydrophobicity and chargecharacteristics. Those indices are: isoleucine (+4.5); valine (+4.2);leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5);methionine (+1.9); alanine (+1.8); glycine (−0.4); threonine (−0.7);serine (−0.8); tryptophan (−0.9); tyrosine (−1.3); proline (−1.6);histidine (−3.2); glutamate (−3.5); glutamine (−3.5); aspartate (−3.5);asparagine (−3.5); lysine (−3.9); and arginine (−4.5).

It is believed that the relative hydropathic character of the amino acidresidue determines the secondary and tertiary structure of the resultantpolypeptide, which in turn defines the interaction of the polypeptidewith other molecules, such as enzymes, substrates, receptors,antibodies, antigens, and the like. It is known in the art that an aminoacid can be substituted by another amino acid having a similarhydropathic index and still obtain a functionally equivalentpolypeptide. In such changes, the substitution of amino acids whosehydropathic indices are within +/−2 is preferred, those that are within+/−1 are particularly preferred, and those within +/−0.5 are even moreparticularly preferred.

Substitution of like amino acids can also be made on the basis ofhydrophilicity, particularly where the biological functional equivalentpolypeptide or peptide thereby created is intended for use inimmunological embodiments. U.S. Pat. No. 4,554,101, incorporatedhereinafter by reference, states that the greatest local averagehydrophilicity of a polypeptide, as governed by the hydrophilicity ofits adjacent amino acids, correlates with its immunogenicity andantigenicity, i.e. with a biological property of the polypeptide.

As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicityvalues have been assigned to amino acid residues: arginine (+3.0);lysine (+3.0); aspartate (+3.0±1); glutamate (+3.0±1); serine (+0.3);asparagine (+0.2); glutamine (+0.2); glycine (0); proline (−0.5±1);threonine (−0.4); alanine (−0.5); histidine (−0.5); cysteine (−1.0);methionine (−1.3); valine (−1.5); leucine (−1.8); isoleucine (−1.8);tyrosine (−2.3); phenylalanine (−2.5); tryptophan (−3.4). It isunderstood that an amino acid can be substituted for another having asimilar hydrophilicity value and still obtain a biologically equivalent,and in particular, an immunologically equivalent polypeptide. In suchchanges, the substitution of amino acids whose hydrophilicity values arewithin ±2 is preferred, those that are within ±1 are particularlypreferred, and those within ±0.5 are even more particularly preferred.

As outlined above, amino acid substitutions are generally thereforebased on the relative similarity of the amino acid side-chainsubstituents, for example, their hydrophobicity, hydrophilicity, charge,size, and the like. Exemplary substitutions which take various of theforegoing characteristics into consideration are well known to those ofskill in the art and include: arginine and lysine; glutamate andaspartate; serine and threonine; glutamine and asparagine; and valine,leucine and isoleucine (See Table 16, below). The present invention thuscontemplates functional or biological equivalents of a Streptococcuspneumoniae polypeptide as set forth above.

TABLE 16 Amino Acid Substitutions Exemplary Original Residue ResidueSubstitution Ala Gly; Ser Arg Lys Asn Gln; His Asp Glu Cys Ser Gln AsnGlu Asp Gly Ala His Asn; Gln Ile Leu; Val Leu Ile; Val Lys Arg Met Leu;Tyr Ser Thr Thr Ser Trp Tyr Tyr Trp; Phe Val Ile; Leu

Biological or functional equivalents of a polypeptide can also beprepared using site-specific mutagenesis. Site-specific mutagenesis is atechnique useful in the preparation of second generation polypeptides,or biologically functional equivalent polypeptides or peptides, derivedfrom the sequences thereof, through specific mutagenesis of theunderlying DNA. As noted above, such changes can be desirable whereamino acid substitutions are desirable. The technique further provides aready ability to prepare and test sequence variants, for example,incorporating one or more of the foregoing considerations, byintroducing one or more nucleotide sequence changes into the DNA.Site-specific mutagenesis allows the production of mutants through theuse of specific oligonucleotide sequences which encode the DNA sequenceof the desired mutation, as well as a sufficient number of adjacentnucleotides, to provide a primer sequence of sufficient size andsequence complexity to form a stable duplex on both sides of thedeletion junction being traversed. Typically, a primer of about 17 to 25nucleotides in length is preferred, with about 5 to 10 residues on bothsides of the junction of the sequence being altered.

In general, the technique of site-specific mutagenesis is well known inthe art. As will be appreciated, the technique typically employs a phagevector which can exist in both a single stranded and double strandedform. Typically, site-directed mutagenesis in accordance herewith isperformed by first obtaining a single-stranded vector which includeswithin its sequence a DNA sequence which encodes all or a portion of theStreptococcus pneumoniae polypeptide sequence selected. Anoligonucleotide primer bearing the desired mutated sequence is prepared(e.g., synthetically). This primer is then annealed to thesingled-stranded vector, and extended by the use of enzymes such as E.coli polymerase I Klenow fragment, in order to complete the synthesis ofthe mutation-bearing strand. Thus, a heteroduplex is formed wherein onestrand encodes the original non-mutated sequence and the second strandbears the desired mutation. T his heteroduplex vector is then used totransform appropriate cells such as E. coli cells and clones areselected which include recombinant vectors bearing the mutation.Commercially available kits come with all the reagents necessary, exceptthe oligonucleotide primers.

A Streptococcus pneumoniae polypeptide or polypeptide antigen of thepresent invention is understood to be any Streptococcus pneumoniaepolypeptide comprising substantial sequence similarity, structuralsimilarity and/or functional similarity to a Streptococcus pneumoniaepolypeptide comprising the amino acid sequence of one of SEQ ID NO:216through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752. Inaddition, a Streptococcus pneumoniae polypeptide or polypeptide antigenof the invention is not limited to a particular source. Thus, theinvention provides for the general detection and isolation of thepolypeptides from a variety of sources.

It is contemplated in the present invention, that a Streptococcuspneumoniae polypeptide may advantageously be cleaved into fragments foruse in further structural or functional analysis, or in the generationof reagents such as Streptococcus pneumoniae-related polypeptides andStreptococcus pneumoniae-specific antibodies. This can be accomplishedby treating purified or unpurified Streptococcus pneumoniae polypeptideswith a peptidase such as endoproteinase glu-C (Boehringer, Indianapolis,Ind.). Treatment with CNBr is another method by which peptide fragmentsmay be produced from natural Streptococcus pneumoniae polypeptides.Recombinant techniques also can be used to produce specific fragments ofa Streptococcus pneumoniae polypeptide.

In addition, the inventors also contemplate that compounds stericallysimilar to a particular Streptococcus pneumoniae polypeptide antigen maybe formulated to mimic the key portions of the peptide structure, calledpeptidomimetics. Mimetics are peptide-containing molecules which mimicelements of protein secondary structure. (see, e.g. Johnson et al.,1993). The underlying rationale behind the use of peptide mimetics isthat the peptide backbone of proteins exists chiefly to orient aminoacid side chains in such a way as to facilitate molecular interactions,such as those of receptor and ligand. Successful applications of thepeptide mimetic concept have thus far focused on mimetics of β-turnswithin proteins. Likely β-turn structures within Streptococcuspneumoniae can be predicted by computer-based algorithms as discussedabove. Once the component amino acids of the turn are determined,mimetics can be constructed to achieve a similar spatial orientation ofthe essential elements of the amino acid side chains, as discussed inJohnson et al., 1993.

Fragments of the Streptococcus pneumoniae polypeptides are also includedin the invention. A fragment is a polypeptide having an amino acidsequence that entirely is the same as part, but not all, of the aminoacid sequence. The fragment can comprise, for example, at least 7 ormore (e.g., 8, 10, 12, 14, 16, 18, 20, or more) contiguous amino acidsof an amino acid sequence of one of SEQ ID NO: 216 through SEQ ID NO:430 or SEQ ID NO:592 through SEQ ID NO: 752. Fragments may be“freestanding” or comprised within a larger polypeptide of which theyform a part or region, most preferably as a single, continuous region.In one embodiment, the fragments include at least one epitope of themature polypeptide sequence.

“Fusion protein” refers to a protein or polypeptide encoded by two,often unrelated, fused genes or fragments thereof. For example, fusionproteins or polypeptides comprising various portions of constant regionof immunoglobulin molecules together with another human protein or partthereof have been described. In many cases, employing an immunoglobulinFc region as a part of a fusion protein or polypeptide is advantageousfor use in therapy and diagnosis resulting in, for example, improvedpharmacokinetic properties (see e.g., International Application EP-A0232 2621). On the other hand, for some uses it would be desirable to beable to delete the Fc part after the fusion protein or polypeptide hasbeen expressed, detected and purified.

D. Streptococcus pneumoniae Polynucleotide and Polypeptide Variants

“Variant” as the term is used hereinafter, is a polynucleotide orpolypeptide that differs from a reference polynucleotide or polypeptiderespectively, but retains essential properties. A typical variant of apolynucleotide differs in nucleotide sequence from another, referencepolynucleotide. Changes in the nucleotide sequence of the variant may ormay not alter the amino acid sequence of a polypeptide encoded by thereference polynucleotide. Nucleotide changes may result in amino acidsubstitutions, additions, deletions, fusions and truncations in thepolypeptide encoded by the reference sequence, as discussed below. Atypical variant of a polypeptide differs in amino acid sequence fromanother, reference polypeptide. Generally, differences are limited sothat the sequences of the reference polypeptide and the variant areclosely similar overall and, in many regions, identical. A variant andreference polypeptide may differ in amino acid sequence by one or moresubstitutions, additions, deletions in any combination. A substituted orinserted amino acid residue may or may not be one encoded by the geneticcode. A variant of a polynucleotide or polypeptide may be a naturallyoccurring such as an allelic variant, or it may be a variant that is notknown to occur naturally. Non-naturally occurring variants ofpolynucleotides and polypeptides may be made by mutagenesis techniquesor by direct synthesis.

“Identity,” as known in the art, is a relationship between two or morepolypeptide sequences or two or more polynucleotide sequences, asdetermined by comparing the sequences. In the art, “identity” also meansthe degree of sequence relatedness between polypeptide or polynucleotidesequences, as the case may be, as determined by the match betweenstrings of such sequences. “Identity” and “similarity” can be readilycalculated by known methods, including but not limited to thosedescribed in (Computational Molecular Biology, Lesk, A. M., ed., OxfordUniversity Press, New York, 1988; Biocomputing: Informatics and GenomeProjects, Smith, D. W., ed., Academic Press, New York, 1993; ComputerAnalysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G.,eds., Humana Press, New Jersey, 1994; Sequence Analysis in MolecularBiology, von Heinje, G., Academic Press, 1987; and Sequence AnalysisPrimer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York,1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073(1988). Preferred methods to determine identity are designed to give thelargest match between the sequences tested. Methods to determineidentity and similarity are codified in publicly available computerprograms. Preferred computer program methods to determine identity andsimilarity between two sequences include, but are not limited to, theGCG program package (Devereux, J., et al 1984), BLASTP, BLASTN, TBLASTNand FASTA (Altschul, S. F., et al., 1990). The BLASTX program ispublicly available from NCBI and other sources (BLAST Manual, Altschul,S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul, S., et al.,1990). The well known Smith-Waterman algorithm may also be used todetermine identity.

By way of example, a polynucleotide sequence of the present inventionmay be identical to the reference sequence of one of SEQ ID NO:1 throughSEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, that is be 100%identical, or it may include up to a certain integer number ofnucleotide alterations as compared to the reference sequence. Suchalterations are selected from the group consisting of at least onenucleotide deletion, substitution, including transition andtransversion, or insertion, and wherein said alterations may occur atthe 5′ or 3′ terminal positions of the reference nucleotide sequence oranywhere between those terminal positions, interspersed eitherindividually among the nucleotides in the reference sequence or in oneor more contiguous groups within the reference sequence. The number ofnucleotide alterations is determined by multiplying the total number ofnucleotides in one of SEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO:431 through SEQ ID NO: 591 by the numerical percent of the respectivepercent identity (divided by 100) and subtracting that product from saidtotal number of nucleotides in one of SEQ ID NO:1 through SEQ ID NO:215or SEQ ID NO: 431 through SEQ ID NO: 591.

For example, an isolated Streptococcus pneumoniae polynucleotidecomprising a polynucleotide sequence that has at least 70% identity tothe nucleic acid sequence of one of SEQ ID NO:1 through SEQ ID NO:215 orSEQ ID NO: 431 through SEQ ID NO: 591; a degenerate variant thereof or afragment thereof, wherein the polynucleotide sequence may include up ton_(n) nucleic acid alterations over the entire polynucleotide region ofthe nucleic acid sequence of one of SEQ ID NO:1 through SEQ ID NO:215 orSEQ ID NO: 431 through SEQ ID NO: 591, wherein n_(n) is the maximumnumber of alterations and is calculated by the formula:

n _(n) ≦x _(n)−(x _(n) ·y),

in which X_(n) is the total number of nucleic acids of one of SEQ IDNO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591 andy has a value of 0.70, wherein any non-integer product of x_(n) and y isrounded down to the nearest integer prior to subtracting such productfrom x_(n). Of course, y may also have a value of 0.80 for 80%, 0.85 for85%, 0.90 for 90% 0.95 for 95%, etc. Alterations of a polynucleotidesequence encoding one of the polypeptides of SEQ ID NO:216 through SEQID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752 may create nonsense,missense or frameshift mutations in this coding sequence and therebyalter the polypeptide encoded by the polynucleotide following suchalterations.

Similarly, a polypeptide sequence of the present invention may beidentical to the reference sequence of SEQ ID NO:216 through SEQ IDNO:430 or SEQ ID NO: 592 through SEQ ID NO: 752, that is be 100%identical, or it may include up to a certain integer number of aminoacid alterations as compared to the reference sequence such that the %identity is less than 100%. Such alterations are selected from the groupconsisting of at least one amino acid deletion, substitution, includingconservative and non-conservative substitution, or insertion, andwherein said alterations may occur at the amino- or carboxy-terminalpositions of the reference polypeptide sequence or anywhere betweenthose terminal positions, interspersed either individually among theamino acids in the reference sequence or in one or more contiguousgroups within the reference sequence. The number of amino acidalterations for a given % identity is determined by multiplying thetotal number of amino acids in one of SEQ ID NO:216 through SEQ IDNO:430 or SEQ ID NO: 592 through SEQ ID NO: 752 by the numerical percentof the respective percent identity (divided by 100) and then subtractingthat product from said total number of amino acids in one of SEQ IDNO:216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752,or:

n _(a) ≦x _(a)−(x _(a) ·y),

wherein n_(a) is the number of amino acid alterations, x_(a) is thetotal number of amino acids in one of SEQ ID NO:216 through SEQ IDNO:430 SEQ ID NO: 592 through SEQ ID NO: 752, and y is, for instance0.70 for 70%, 0.80 for 80%, 0.85 for 85% etc., and wherein anynon-integer product of x_(a) and y is rounded down to the nearestinteger prior to subtracting it from x_(a).

E. Vectors, Host Cells and Recombinant Streptococcus pneumoniaePolypeptides

In a preferred embodiment, the present invention provides expressionvectors comprising ORF polynucleotides that encode Streptococcuspneumoniae polypeptides. Preferably, the expression vectors of thepresent invention comprise ORF polynucleotides that encode Streptococcuspneumoniae polypeptides comprising the amino acid residue sequence ofone of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592 through SEQID NO: 752. More preferably, the expression vectors of the presentinvention comprise a polynucleotide comprising the nucleotide basesequence of one of SEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO: 431through SEQ ID NO: 591. Even more preferably, the expression vectors ofthe invention comprise a polynucleotide operatively linked to anenhancer-promoter. More preferably still, the expression vectors of theinvention comprise polynucleotide operatively linked to a prokaryoticpromoter. Alternatively, the expression vectors of the present inventioncomprise polynucleotide operatively linked to an enhancer-promoter thatis a eukaryotic promoter, and the expression vectors further comprise apolyadenylation signal that is positioned 3′ of the carboxy-terminalamino acid and within a transcriptional unit of the encoded polypeptide.

Expression of proteins in prokaryotes is most often carried out in E.coli with vectors containing constitutive or inducible promotersdirecting the expression of either fusion or non-fusion proteins. Fusionvectors add a number of amino acids to a protein encoded therein,usually to the amino terminus of the recombinant protein. Such fusionvectors typically serve three purposes: 1) to increase expression ofrecombinant protein; 2) to increase the solubility of the recombinantprotein; and 3) to aid in the purification of the recombinant protein byacting as a ligand in affinity purification. Often, in fusion expressionvectors, a proteolytic cleavage site is introduced at the junction ofthe fusion moiety and the recombinant protein to enable separation ofthe recombinant protein from the fusion moiety subsequent topurification of the fusion protein. Such enzymes, and their cognaterecognition sequences, include Factor Xa, thrombin and enterokinase.

Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc;Smith and Johnson, 1988), pMAL (New England Biolabs, Beverly; MA) andpRIT5 (Pharmacia, Piscataway, N.J.) which fuse glutathione S-transferase(GST), maltose E binding protein, or protein A, respectively, to thetarget recombinant protein.

In one embodiment, the coding sequence of the Streptococcus pneumoniaepolynucleotide is cloned into a pGEX expression vector to create avector encoding a fusion protein comprising, from the N-terminus to theC-terminus, GST-thrombin cleavage site-Streptococcus pneumoniaepolypeptide. The fusion protein can be purified by affinitychromatography using glutathione-agarose resin. RecombinantStreptococcus pneumoniae polypeptide unfused to GST can be recovered bycleavage of the fusion protein with thrombin.

Examples of suitable inducible non-fusion E. coli expression vectorsinclude pTrc (Amann et al., 1988), pET IId (Studier et al., 1990), pBADand pCRT7. Target gene expression from the pTrc vector relies on hostRNA polymerase transcription from a hybrid trp-lac fusion promoter.Target gene expression from the pET IId vector relies on transcriptionfrom a T7 gn1 0-lac fusion promoter mediated by a coexpressed viral RNApolymerase J7 gnl. This viral polymerase is supplied by host strainsBL21 (DE3) or HMS I 74(DE3) from a resident prophage harboring a T7 gnlgene under the transcriptional control of the lacUV 5 promoter.

One strategy to maximize recombinant protein expression in E. coli is toexpress the protein in a host bacterium with an impaired capacity toproteolytically cleave the recombinant protein. Another strategy is toalter the nucleic acid sequence of the nucleic acid to be inserted intoan expression vector so that the individual codons for each amino acidare those preferentially utilized in E. coli. Such alteration of nucleicacid sequences of the invention can be carried out by standard DNAmutagenesis or synthesis techniques.

In another embodiment, the Streptococcus pneumoniae polynucleotideexpression vector is a yeast expression vector. Examples of vectors forexpression in yeast S. cerivisae include pYepSec I (Baldari, et al.,1987), pMFa (Kurjan and Herskowitz, 1982), pJRY88 (Schultz et al.,1987), and pYES2 (Invitrogen Corporation, San Diego, Calif.).

Alternatively, a Streptococcus pneumoniae polynucleotide can beexpressed in insect cells using, for example, baculovirus expressionvectors. Baculovirus vectors available for expression of proteins incultured insect cells (e.g., Sf 9 cells) include the pAc series (Smithet al., 1983) and the pVL series (Lucklow and Summers, 1989).

In yet another embodiment, a nucleic acid of the invention is expressedin mammalian cells using a mammalian expression vector. Examples ofmammalian expression vectors include pCDM8 (Seed, 1987) and pMT2PC(Kaufman et al., 1987). When used in mammalian cells, the expressionvector's control functions are often provided by viral regulatoryelements.

As used hereinafter, a promoter is a region of a DNA molecule typicallywithin about 100 nucleotide pairs in front of (upstream of) the point atwhich transcription begins (i.e., a transcription start site). Thatregion typically contains several types of DNA sequence elements thatare located in similar relative positions in different genes. As usedhereinafter, the term “promoter” includes what is referred to in the artas an upstream promoter region, a promoter region or a promoter of ageneralized eukaryotic RNA Polymerase II transcription unit.

Another type of discrete transcription regulatory sequence element is anenhancer. An enhancer provides specificity of time, location andexpression level for a particular encoding region (e.g., gene). A majorfunction of an enhancer is to increase the level of transcription of acoding sequence in a cell that contains one or more transcriptionfactors that bind to that enhancer. Unlike a promoter, an enhancer canfunction when located at variable distances from transcription startsites so long as a promoter is present.

As used hereinafter, the phrase “enhancer-promoter” means a compositeunit that contains both enhancer and promoter elements. Anenhancer-promoter is operatively linked to a coding sequence thatencodes at least one gene product. As used hereinafter, the phrase“operatively linked” means that an enhancer-promoter is connected to acoding sequence in such a way that the transcription of that codingsequence is controlled and regulated by that enhancer-promoter. Meansfor operatively linking an enhancer-promoter to a coding sequence arewell known in the art. As is also well known in the art, the preciseorientation and location relative to a coding sequence whosetranscription is controlled, is dependent inter alia upon the specificnature of the enhancer-promoter. Thus, a TATA box minimal promoter istypically located from about 25 to about 30 base pairs upstream of atranscription initiation site and an upstream promoter element istypically located from about 100 to about 200 base pairs upstream of atranscription initiation site. In contrast, an enhancer can be locateddownstream from the initiation site and can be at a considerabledistance from that site.

An enhancer-promoter used in a vector construct of the present inventioncan be any enhancer-promoter that drives expression in a cell to betransfected. By employing an enhancer-promoter with well-knownproperties, the level and pattern of gene product expression can beoptimized.

For example, commonly used promoters are derived from polyoma,Adenovirus 2, cytomegalovirus and Simian Virus 40. For other suitableexpression systems for both prokaryotic and eukaryotic cells seechapters 16 and 17 of Sambrook et al., “Molecular Cloning: A LaboratoryManual” 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y., 1989, incorporatedhereinafter by reference.

In another embodiment, the recombinant mammalian expression vector iscapable of directing expression of the nucleic acid preferentially in aparticular cell type (e.g., tissue-specific regulatory elements are usedto express the nucleic acid). Tissue-specific regulatory elements areknown in the art. Non-limiting examples of suitable tissue-specificpromoters include the albumin promoter (liver-specific; Pinkert et al.,1987), lymphoid-specific promoters (Calame and Eaton, 1988), inparticular, promoters of T cell receptors (Winoto and Baltimore, 1989)and immunoglobulins (Banerji et al., 1983), Queen and Baltimore (1983),neuron-specific promoters (e.g., the neurofilament promoter; Byrne andRuddle, 1989), pancreas-specific promoters (Edlund et al., 1985), andmammary gland-specific promoters (e.g., milk whey promoter; U.S. Pat.No. 4,873,316 and International Application EP 264,166).Developmentally-regulated promoters are also encompassed, for examplethe murine hox promoters (Kessel and Gruss, 1990) and the α-fetoproteinpromoter (Campes and Tilghman, 1989).

The invention further provides a recombinant expression vectorcomprising a DNA molecule encoding a Streptococcus pneumoniaepolypeptide cloned into the expression vector in an antisenseorientation. That is, the DNA molecule is operatively linked to aregulatory sequence in a manner which allows for expression (bytranscription of the DNA molecule) of an RNA molecule which is antisenseto Streptococcus pneumoniae mRNA. Regulatory sequences operativelylinked to a nucleic acid cloned in the antisense orientation can bechosen which direct the continuous expression of the antisense RNAmolecule in a variety of cell types. For instance viral promoters and/orenhancers, or regulatory sequences can be chosen which directconstitutive, tissue specific or cell type specific expression ofantisense RNA. The antisense expression vector can be in the form of arecombinant plasmid, phagemid or attenuated virus in which antisensenucleic acids are produced under the control of a high efficiencyregulatory region, the activity of which can be determined by the celltype into which the vector is introduced.

Another aspect of the invention pertains to host cells into which arecombinant expression vector of the invention has been introduced. Theterms “host cell” and “recombinant host cell” are used interchangeablyhereinafter. It is understood that such terms refer not only to theparticular subject cell, but to the progeny or potential progeny of sucha cell. Because certain modifications may occur in succeedinggenerations due to either mutation or environmental influences, suchprogeny may not, in fact, be identical to the parent cell, but are stillincluded within the scope of the term as used hereinafter. A host cellcan be any prokaryotic or eukaryotic cell. For example, a Streptococcuspneumoniae polypeptide can be expressed in bacterial cells such as E.coli, insect cells (such as Sf9, Sf21), yeast or mammalian cells (suchas Chinese hamster ovary cells (CHO), VERO, chick embryo fibroblasts,BHK cells or COS cells). Other suitable host cells are known to thoseskilled in the art.

Vector DNA is introduced into prokaryotic or eukaryotic cells viaconventional transformation, infection or transfection techniques. Asused hereinafter, the terms “transformation” and “transfection” areintended to refer to a variety of art-recognized techniques forintroducing foreign nucleic acid (e.g., DNA) into a host cell, includingcalcium phosphate or calcium chloride co-precipitation,DEAE-dextran-mediated transfection, lipofection, ultrasound orelectroporation. Suitable methods for transforming or transfecting hostcells can be found in Sambrook, et al. (“Molecular Cloning: A LaboratoryManual” 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y., 1989), and other laboratorymanuals.

A host cell of the invention, such as a prokaryotic or eukaryotic hostcell in culture, can be used to produce (i.e., express) a Streptococcuspneumoniae polypeptide. Accordingly, the invention further providesmethods for producing a Streptococcus pneumoniae polypeptide using thehost cells of the invention. In one embodiment, the method comprisesculturing the host cell of invention (into which a recombinantexpression vector encoding a Streptococcus pneumoniae polypeptide hasbeen introduced) in a suitable medium until the Streptococcus pneumoniaepolypeptide is produced. In another embodiment, the method furthercomprises isolating the Streptococcus pneumoniae polypeptide from themedium or the host cell.

A coding sequence of an expression vector is operatively linked to atranscription termination region. RNA polymerase transcribes an encodingDNA sequence through a site where polyadenylation occurs. Typically, DNAsequences located a few hundred base pairs downstream of thepolyadenylation site serve to terminate transcription. Those DNAsequences are referred to hereinafter as transcription-terminationregions. Those regions are required for efficient polyadenylation oftranscribed messenger RNA (mRNA). Transcription-termination regions arewell known in the art. A preferred transcription-termination region usedin an adenovirus vector construct of the present invention comprises apolyadenylation signal of SV40 or the protamine gene.

An expression vector comprises a polynucleotide that encodes aStreptococcus pneumoniae polypeptide. Such a polypeptide is meant toinclude a sequence of nucleotide bases encoding a Streptococcuspneumoniae polypeptide sufficient in length to distinguish the segmentfrom a polynucleotide segment encoding a non-Streptococcus pneumoniaepolypeptide. A polypeptide of the invention can also encode biologicallyfunctional polypeptides or peptides which have variant amino acidsequences, such as with changes selected based on considerations such asthe relative hydropathic score of the amino acids being exchanged. Thesevariant sequences are those isolated from natural sources or induced inthe sequences disclosed hereinafter using a mutagenic procedure such assite-directed mutagenesis.

Preferably, the expression vectors of the present invention comprisepolynucleotide that encode polypeptides comprising the amino acidresidue sequence of one of SEQ ID NO:216 through SEQ ID NO:430 or SEQ IDNO: 592 through SEQ ID NO: 752. An expression vector can include aStreptococcus pneumoniae polypeptide coding region itself of any of theStreptococcus pneumoniae polypeptides noted above or it can containcoding regions bearing selected alterations or modifications in thebasic coding region of such a Streptococcus pneumoniae polypeptide.Alternatively, such vectors or fragments can code larger polypeptides orpolypeptides which nevertheless include the basic coding region. In anyevent, it should be appreciated that due to codon redundancy as well asbiological functional equivalence, this aspect of the invention is notlimited to the particular DNA molecules corresponding to the polypeptidesequences noted above.

Exemplary vectors include the mammalian expression vectors of the pCMVfamily including pCMV6b and pCMV6c (Chiron Corp., Emeryville Calif.). Incertain cases, and specifically in the case of these individualmammalian expression vectors, the resulting constructs can requireco-transfection with a vector containing a selectable marker such aspSV2neo. Via co-transfection into a dihydrofolate reductase-deficientChinese hamster ovary cell line, such as DG44, clones expressingStreptococcus pneumoniae polypeptides by virtue of DNA incorporated intosuch expression vectors can be detected.

A DNA molecule of the present invention can be incorporated into avector by a number of techniques that are well known in the art. Forinstance, the vector pUC18 has been demonstrated to be of particularvalue in cloning and expression of genes. Likewise, the related vectorsM13 mp 18 and M13 mp 19 can be used in certain embodiments of theinvention, in particular, in performing dideoxy sequencing.

An expression vector of the present invention is useful both as a meansfor preparing quantities of the Streptococcus pneumoniaepolypeptide-encoding DNA itself, and as a means for preparing theencoded polypeptide and peptides. It is contemplated that whereStreptococcus pneumoniae polypeptides of the invention are made byrecombinant means, one can employ either prokaryotic or eukaryoticexpression vectors as shuttle systems.

In another aspect, the recombinant host cells of the present inventionare prokaryotic host cells. Preferably, the recombinant host cells ofthe invention are bacterial cells of the DH5 α strain of Escherichiacoli. In general, prokaryotes are preferred for the initial cloning ofDNA sequences and constructing the vectors useful in the invention. Forexample, E. coli K12 strains can be particularly useful. Other microbialstrains that can be used include E. coli B, and E. coli _(x)1976 (ATCCNo. 31537). These examples are, of course, intended to be illustrativerather than limiting.

The aforementioned strains, as well as E. coli W3110 (ATCC No. 273325),E. coli BL21(DE3), E. coli Top10, bacilli such as Bacillus subtilis, orother enterobacteriaceae such as Salmonella typhimurium (or otherattenuated Salmonella strains as described in U.S. Pat. No. 4,837,151)or Serratia marcesans, and various Pseudomonas species can be used.

In general, plasmid vectors containing replicon and control sequences,which are derived from species compatible with the host cell are used inconnection with these hosts. The vector ordinarily carries a replicationsite, as well as marking sequences which are capable of providingphenotypic selection in transformed cells. For example, E. coli can betransformed using pBR322, a plasmid derived from an E. coli species(Bolivar, et al. 1977). pBR322 contains genes for ampicillin andtetracycline resistance and thus provides easy means for identifyingtransformed cells. The pBR plasmid, or other microbial plasmid or phagemust also contain, or be modified to contain, promoters which can beused by the microbial organism for expression of its own polypeptides.

Those promoters most commonly used in recombinant DNA constructioninclude the β-lactamase (penicillinase) and lactose promoter systems(Chang, et al. 1978; Itakura., et al. 1977, Goeddel, et al. 1979;Goeddel, et al. 1980) and a tryptophan (TRP) promoter system (EP0036776; Siebwenlist et al. 1980). While these are the most commonlyused, other microbial promoters have been discovered and utilized, anddetails concerning their nucleotide sequences have been published,enabling a skilled worker to introduce functional promoters into plasmidvectors (Siebwenlist, et al. 1980).

In addition to prokaryotes, eukaryotic microbes such as yeast can alsobe used. Saccharomyces cerevisiase or common baker's yeast is the mostcommonly used among eukaryotic microorganisms, although a number ofother strains are commonly available. For expression in Saccharomyces,the plasmid YRp7, for example, is commonly used (Stinchcomb, et al.1979; Kingsman, et al. 1979; Tschemper, et al. 1980). This plasmidalready contains the trp1 gene which provides a selection marker for amutant strain of yeast lacking the ability to grow in tryptophan, forexample ATCC No. 44076 or PEP4-1 (Jones, 1977). The presence of the trpllesion as a characteristic of the yeast host cell genome then providesan effective environment for detecting transformation by growth in theabsence of tryptophan.

Suitable promoter sequences in yeast vectors include the promoters for3-phosphoglycerate kinase (Hitzeman., et al. 1980) or other glycolyticenzymes (Hess, et al. 1968; Holland, et al. 1978) such as enolase,glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvatedecarboxylase, phosphofructokinase, glucose-6-phosphate isomerase,3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase,phosphoglucose isomerase, and glucokinase. In constructing suitableexpression plasmids, the termination sequences associated with thesegenes are also introduced into the expression vector downstream from thesequences to be expressed to provide polyadenylation of the mRNA andtermination. Other promoters, which have the additional advantage oftranscription controlled by growth conditions are the promoter regionfor alcohol dehydrogenase 2, isocytochrome C, acid phosphatase,degradative enzymes associated with nitrogen metabolism, and theaforementioned glyceraldehyde-3-phosphate dehydrogenase, and enzymesresponsible for maltose and galactose utilization. Any plasmid vectorcontaining a yeast-compatible promoter, origin or replication andtermination sequences are suitable.

In addition to microorganisms, cultures of cells derived frommulticellular organisms can also be used as hosts. In principle, anysuch cell culture is workable, whether from vertebrate or invertebrateculture. However, interest has been greatest in vertebrate cells, andpropagation of vertebrate cells in culture (tissue culture) has become aroutine procedure in recent years. Examples of such useful host celllines are AtT-20, VERO, HeLa, NSO, PER C6, Chinese hamster ovary (CHO)cell lines, and W138, BHK, COSM6, COS-7, 293 and MDCK cell lines.Expression vectors for such cells ordinarily include (if necessary) anorigin of replication, a promoter located upstream of the gene to beexpressed, along with any necessary ribosome binding sites, RNA splicesites, polyadenylation site, and transcriptional terminator sequences.

Where expression of recombinant Streptococcus pneumoniae polypeptides isdesired and a eukaryotic host is contemplated, it is most desirable toemploy a vector such as a plasmid, that incorporates a eukaryotic originof replication. Additionally, for the purposes of expression ineukaryotic systems, one desires to position the Streptococcus pneumoniaeencoding sequence adjacent to and under the control of an effectiveeukaryotic promoter such as promoters used in combination with Chinesehamster ovary cells. To bring a coding sequence under control of apromoter, whether it is eukaryotic or prokaryotic, the 5′ end of thetranslation initiation region of the proper translational reading frameof the polypeptide must be positioned between about 1 and about 50nucleotides 3′ of or downstream with respect to the promoter chosen.Furthermore, where eukaryotic expression is anticipated, one wouldtypically desire to incorporate into the transcriptional unit whichincludes the Streptococcus pneumoniae polypeptide.

Means of transforming or transfecting cells with exogenouspolynucleotide such as DNA molecules are well known in the art andinclude techniques such as calcium-phosphate- or DEAE-dextran-mediatedtransfection, protoplast fusion, electroporation, liposome mediatedtransfection, direct microinjection and adenovirus infection (see e.g.,Sambrook, Fritsch and Maniatis, 1989).

The most widely used method is transfection mediated by either calciumphosphate or DEAE-dextran. Although the mechanism remains obscure, it isbelieved that the transfected DNA enters the cytoplasm of the cell byendocytosis and is transported to the nucleus. Depending on the celltype, up to 90% of a population of cultured cells can be transfected atany one time. Because of its high efficiency, transfection mediated bycalcium phosphate or DEAE-dextran is the method of choice forexperiments that require transient expression of the foreign DNA inlarge numbers of cells. Calcium phosphate-mediated transfection is alsoused to establish cell lines that integrate copies of the foreign DNA,which are usually arranged in head-to-tail tandem arrays into the hostcell genome.

In the protoplast fusion method, protoplasts derived from bacteriacarrying high numbers of copies of a plasmid of interest are mixeddirectly with cultured mammalian cells. After fusion of the cellmembranes (usually with polyethylene glycol), the contents of thebacteria are delivered into the cytoplasm of the mammalian cells and theplasmid DNA is transported to the nucleus. Protoplast fusion is not asefficient as transfection for many of the cell lines that are commonlyused for transient expression assays, but it is useful for cell lines inwhich endocytosis of DNA occurs inefficiently. Protoplast fusionfrequently yields multiple copies of the plasmid DNA tandemly integratedinto the host chromosome.

The application of brief, high-voltage electric pulses to a variety ofmammalian and plant cells leads to the formation of nanometer-sizedpores in the plasma membrane. DNA is taken directly into the cellcytoplasm either through these pores or as a consequence of theredistribution of membrane components that accompanies closure of thepores. Electroporation can be extremely efficient and can be used bothfor transient expression of cloned genes and for establishment of celllines that carry integrated copies of the gene of interest.Electroporation, in contrast to calcium phosphate-mediated transfectionand protoplast fusion, frequently gives rise to cell lines that carryone, or at most a few, integrated copies of the foreign DNA.

Liposome transfection involves encapsulation of DNA and RNA withinliposomes, followed by fusion of the liposomes with the cell membrane.The mechanism of how DNA is delivered into the cell is unclear buttransfection efficiencies can be as high as 90%.

Direct microinjection of a DNA molecule into nuclei has the advantage ofnot exposing DNA to cellular compartments such as low-pH endosomes.Microinjection is therefore used primarily as a method to establishlines of cells that carry integrated copies of the DNA of interest.

The use of adenovirus as a vector for cell transfection is well known inthe art. Adenovirus vector-mediated cell transfection has been reportedfor various cells (Stratford-Perricaudet, et al. 1992).

A transfected cell can be prokaryotic or eukaryotic. Preferably, thehost cells of the invention are prokaryotic host cells. Where it is ofinterest to produce a Streptococcus pneumoniae polypeptide, culturedprokaryotic host cells are of particular interest.

In yet another embodiment, the present invention contemplates a processor method of preparing Streptococcus pneumoniae polypeptides comprisingtransforming, transfecting or infecting cells with a polynucleotide thatencodes a Streptococcus pneumoniae polypeptide to produce transformedhost cells; and maintaining the transformed host cells under biologicalconditions sufficient for expression of the polypeptide. Preferably, thetransformed host cells are prokaryotic cells. Alternatively, the hostcells are eukaryotic cells. More preferably, the prokaryotic cells arebacterial cells of the DH5-α strain of Escherichia coli. Even morepreferably, the polynucleotide transfected into the transformed cellscomprise the nucleic acid sequence of one of SEQ ID NO: 1 through SEQ IDNO: 215 or SEQ ID NO: 431 through SEQ ID NO: 591. Additionally,transfection is accomplished using an expression vector disclosed above.A host cell used in the process is capable of expressing a functional,recombinant Streptococcus pneumoniae polypeptide.

Following transfection, the cell is maintained under culture conditionsfor a period of time sufficient for expression of a Streptococcuspneumoniae polypeptide. Culture conditions are well known in the art andinclude ionic composition and concentration, temperature, pH and thelike. Typically, transfected cells are maintained under cultureconditions in a culture medium. Suitable media for various cell typesare well known in the art. In a preferred embodiment, temperature isfrom about 20° C. to about 50° C., more preferably from about 30° C. toabout 40° C. and, even more preferably about 37° C.

The pH is preferably from about a value of 6.0 to a value of about 8.0,more preferably from about a value of about 6.8 to a value of about 7.8and, most preferably about 7.4. Osmolality is preferably from about 200milliosmols per liter (mosm/L) to about 400 mosm/l and, more preferablyfrom about 290 mosm/L to about 310 mosm/L. Other biological conditionsneeded for transfection and expression of an encoded protein are wellknown in the art.

Transfected cells are maintained for a period of time sufficient forexpression of an Streptococcus pneumoniae polypeptide. A suitable timedepends inter alia upon the cell type used and is readily determinableby a skilled artisan. Typically, maintenance time is from about 2 toabout 14 days.

Recombinant Streptococcus pneumoniae polypeptide is recovered orcollected either from the transfected cells or the medium in which thosecells are cultured. Recovery comprises isolating and purifying theStreptococcus pneumoniae polypeptide. Isolation and purificationtechniques for polypeptides are well known in the art and include suchprocedures as precipitation, filtration, chromatography, electrophoresisand the like.

F. Antibodies Immunoreactive with Streptococcus pneumoniae Polypeptides

In still another embodiment, the present invention provides antibodiesimmunoreactive with Streptococcus pneumoniae polypeptides. Preferably,the antibodies of the invention are monoclonal antibodies. Additionally,the Streptococcus pneumoniae polypeptides comprise the amino acidresidue sequence of one of SEQ ID NO:216 through SEQ ID NO:430 or SEQ IDNO: 592 through SEQ ID NO: 752. Means for preparing and characterizingantibodies are well known in the art (See, e.g., Antibodies “ALaboratory Manual”, E. Harlow and D. Lane, Cold Spring HarborLaboratory, 1988).

Briefly, a polyclonal antibody is prepared by immunizing an animal withan immunogen comprising a polypeptide or polynucleotide of the presentinvention, and collecting antisera from that immunized animal. A widerange of animal species can be used for the production of antisera.Typically an animal used for production of anti-antisera is a rabbit, amouse, a rat, a hamster or a guinea pig. Because of the relatively largeblood volume of rabbits, a rabbit is a preferred choice for productionof polyclonal antibodies.

As is well known in the art, a given polypeptide or polynucleotide mayvary in its immunogenicity. It is often necessary therefore to couplethe immunogen (e.g., a polypeptide or polynucleotide) of the presentinvention with a carrier. Exemplary and preferred carriers are CRM₁₉₇,keyhole limpet hemocyanin (KLH) and bovine serum albumin (BSA). Otheralbumins such as ovalbumin, mouse serum albumin or rabbit serum albumincan also be used as carriers.

Means for conjugating a polypeptide or a polynucleotide to a carrierprotein are well known in the art and include glutaraldehyde,m-maleimidobencoyl-N-hydroxysuccinimide ester, carbodiimide andbis-biazotized benzidine.

The amount of immunogen used for the production of polyclonal antibodiesvaries inter alia, upon the nature of the immunogen as well as theanimal used for immunization. A variety of routes can be used toadminister the immunogen (subcutaneous, intramuscular, intradermal,intravenous and intraperitoneal). The production of polyclonalantibodies is monitored by sampling blood of the immunized animal atvarious points following immunization. When a desired level ofimmunogenicity is obtained, the immunized animal can be bled and theserum isolated and stored.

In another aspect, the present invention contemplates a process ofproducing an antibody immunoreactive with a Streptococcus pneumoniaepolypeptide comprising the steps of (a) transfecting recombinant hostcells with a polynucleotide that encodes a Streptococcus pneumoniaepolypeptide; (b) culturing the host cells under conditions sufficientfor expression of the polypeptide; (c) recovering the polypeptides; and(d) preparing the antibodies to the polypeptides. Preferably, the hostcell is transfected with the polynucleotide of one of SEQ ID NO:1through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591. Evenmore preferably, the present invention provides antibodies preparedaccording to the process described above.

A monoclonal antibody of the present invention can be readily preparedthrough use of well-known techniques such as those exemplified in U.S.Pat. No. 4,196,265, hereinafter incorporated by reference. Typically, atechnique involves first immunizing a suitable animal with a selectedantigen (e.g., a polypeptide or polynucleotide of the present invention)in a manner sufficient to provide an immune response. Rodents, such asmice and rats, are preferred animals. Spleen cells from the immunizedanimal are then fused with cells of an immortal myeloma cell. Where theimmunized animal is a mouse, a preferred myeloma cell is a murine NS-1myeloma cell.

The fused spleen/myeloma cells are cultured in a selective medium toselect fused spleen/myeloma cells from the parental cells. Fused cellsare separated from the mixture of non-fused parental cells, e.g., by theaddition of agents that block the de novo synthesis of nucleotides inthe tissue culture media. Exemplary and preferred agents areaminopterin, methotrexate, and azaserine. Aminopterin and methotrexateblock de novo synthesis of both purines and pyrimidines, whereasazaserine blocks only purine synthesis. Where aminopterin ormethotrexate is used, the media is supplemented with hypoxanthine andthymidine as a source of nucleotides. Where azaserine is used, the mediais supplemented with hypoxanthine.

This culturing provides a population of hybridomas from which specifichybridomas are selected. Typically, selection of hybridomas is performedby culturing the cells by single-clone dilution in microtiter plates,followed by testing the individual clonal supernatants for reactivitywith an antigen-polypeptide. The selected clones can then be propagatedindefinitely to provide the monoclonal antibody.

By way of specific example, to produce an antibody of the presentinvention, mice are injected intraperitoneally with between about 1-200μg of an antigen comprising a polypeptide of the present invention. Blymphocyte cells are stimulated to grow by injecting the antigen inassociation with an adjuvant such as complete Freund's adjuvant (anon-specific stimulator of the immune response containing killedMycobacterium tuberculosis). At some time (e.g., at least two weeks)after the first injection, mice are boosted by injection with a seconddose of the antigen mixed with incomplete Freund's adjuvant.

A few weeks after the second injection, mice are tail bled and the seratitered by immunoprecipitation against radiolabeled antigen. Preferably,the process of boosting and titering is repeated until a suitable titeris achieved. The spleen of the mouse with the highest titer is removedand the spleen lymphocytes are obtained by homogenizing the spleen witha syringe. Typically, a spleen from an immunized mouse containsapproximately 5×10⁷ to 2×10⁸ lymphocytes.

Mutant lymphocyte cells known as myeloma cells are obtained fromlaboratory animals in which such cells have been induced to grow by avariety of well-known methods. Myeloma cells lack the salvage pathway ofnucleotide biosynthesis. Because myeloma cells are tumor cells, they canbe propagated indefinitely in tissue culture, and are thus denominatedimmortal. Numerous cultured cell lines of myeloma cells from mice andrats, such as murine NS-1 myeloma cells, have been established.

Myeloma cells are combined under conditions appropriate to foster fusionwith the normal antibody-producing cells from the spleen of the mouse orrat injected with the antigen/polypeptide of the present invention.Fusion conditions include, for example, the presence of polyethyleneglycol. The resulting fused cells are hybridoma cells. Like myelomacells, hybridoma cells grow indefinitely in culture.

Hybridoma cells are separated from unfused myeloma cells by culturing ina selection medium such as HAT media (hypoxanthine, aminopterin,thymidine). Unfused myeloma cells lack the enzymes necessary tosynthesize nucleotides from the salvage pathway because they are killedin the presence of aminopterin, methotrexate, or azaserine. Unfusedlymphocytes also do not continue to grow in tissue culture. Thus, onlycells that have successfully fused (hybridoma cells) can grow in theselection media.

Each of the surviving hybridoma cells produces a single antibody. Thesecells are then screened for the production of the specific antibodyimmunoreactive with an antigen/polypeptide of the present invention.Single cell hybridomas are isolated by limiting dilutions of thehybridomas. The hybridomas are serially diluted many times and, afterthe dilutions are allowed to grow, the supernatant is tested for thepresence of the monoclonal antibody. The clones producing that antibodyare then cultured in large amounts to produce an antibody of the presentinvention in convenient quantity.

By use of a monoclonal antibody of the present invention, specificpolypeptides and polynucleotide of the invention are identified asantigens. Once identified, those polypeptides and polynucleotides areisolated and purified by techniques such as antibody-affinitychromatography. In antibody-affinity chromatography, a monoclonalantibody is bound to a solid substrate and exposed to a solutioncontaining the desired antigen. The antigen is removed from the solutionthrough an immunospecific reaction with the bound antibody. Thepolypeptide or polynucleotide is then easily removed from the substrateand purified.

Additionally, examples of methods and reagents particularly amenable foruse in generating and screening antibody display library can be foundin, for example, U.S. Pat. No. 5,223,409; International Application WO92/18619; International Application WO 91/17271; InternationalApplication WO 92/20791; International Application WO 92/15679;International Application WO 93/01288; International Application WO92/01047; International Application WO 92/09690; InternationalApplication WO 90/02809.

Additionally, recombinant anti-Streptococcus pneumoniae antibodies, suchas chimeric and humanized monoclonal antibodies, comprising both humanand non-human fragments, which can be made using standard recombinantDNA techniques, are within the scope of the invention. Such chimeric andhumanized monoclonal antibodies can be produced by recombinant DNAtechniques known in the art, for example using methods described inInternational Application PCT/US86/02269; International Application EP184,187; International Application EP 171,496; International ApplicationEP 173,494; International Application WO 86/01533; U.S. Pat. No.4,816,567; and International Application EP 125,023.

An anti-Streptococcus pneumoniae antibody (e.g., monoclonal antibody) isused to isolate Streptococcus pneumoniae polypeptides by standardtechniques, such as affinity chromatography or immunoprecipitation. Ananti-Streptococcus pneumoniae antibody facilitates the purification of anatural Streptococcus pneumoniae polypeptide from cells andrecombinantly produced Streptococcus pneumoniae polypeptides expressedin host cells. Moreover, an anti-Streptococcus pneumoniae antibody isused to detect Streptococcus pneumoniae polypeptide (e.g., in a cellularlysate or cell supernatant) in order to evaluate the abundance of theStreptococcus pneumoniae polypeptide. The detection of circulatingfragments of a Streptococcus pneumoniae polypeptide is used to identifyStreptococcus pneumoniae polypeptide turnover in a subject.Anti-Streptococcus pneumoniae antibodies are used diagnostically tomonitor protein levels in tissue as part of a clinical testingprocedure, e.g., to, for example, determine the efficacy of a giventreatment regimen. Detection is facilitated by coupling (i.e.,physically linking) the antibody to a detectable substance. Examples ofdetectable substances include various enzymes, prosthetic groups,fluorescent materials, luminescent materials, bioluminescent materials,and radioactive materials. Examples of suitable enzymes includehorseradish peroxidase, alkaline phosphatase, P-galactosidase, oracetylcholinesterase; examples of suitable prosthetic group complexesinclude streptavidin/biotin and avidin/biotin; examples of suitablefluorescent materials include umbelliferone, fluorescein, fluoresceinisothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansylchloride or phycoerythrin; an example of a luminescent material includesluminol; examples of bioluminescent materials include luciferase,luciferin, and acquorin, and examples of suitable radioactive materialinclude ¹²⁵I, ¹³¹I, ¹⁵S or ³H.

G. Pharmaceutical and Immunogenic Compositions

In certain embodiments, the present invention provides pharmaceuticaland immunogenic compositions comprising Streptococcus pneumoniaepolypeptides and physiologically acceptable carriers. More preferably,the pharmaceutical compositions comprise one or more Streptococcuspneumoniae polypeptides comprising the amino acid residue sequence ofone or more of SEQ ID NO:216 through SEQ ID NO:430 or SEQ ID NO: 592through SEQ ID NO: 752. In other embodiments, the pharmaceuticalcompositions of the invention comprise polynucleotides that encodeStreptococcus pneumoniae polypeptides, and physiologically acceptablecarriers. Preferably, the pharmaceutical and immunogenic compositions ofthe present invention comprise Streptococcus pneumoniae polypeptidescomprising the amino acid sequence of one of SEQ ID NO:216 through SEQID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752. Alternatively, thepharmaceutical and immunogenic compositions comprise polynucleotidescomprising the nucleotide sequence of one of SEQ ID NO:1 through SEQ IDNO:215 or SEQ ID NO: 431 through SEQ ID NO: 591.

Various tests are used to assess the in vitro immunogenicity of thepolypeptides of the invention. For example, an in vitro opsonic assay isconducted by incubating together a mixture of Streptococcus pneumoniaecells, heat inactivated human serum containing specific antibodies tothe polypeptide in question, and an exogenous complement source.Opsonophagocytosis proceeds during incubation of freshly isolated humanpolymorphonuclear cells (PMN's) and the antibody/complement/pneumococcalcell mixture. Bacterial cells that are coated with antibody andcomplement are killed upon opsonophagocytosis. Colony forming units(cfu) of surviving bacteria that escape from opsonophagocytosis aredetermined by plating the assay mixture. Titers are reported as thereciprocal of the highest dilution that gives ≧50% bacterial killing, asdetermined by comparison to assay controls. Specimens which demonstrateless than 50% killing at the lowest serum dilution tested (1:8), arereported as having an OPA titer of 4. The highest dilution tested is1:2560. Samples with 50% killing at the highest dilution are repeated,beginning with a higher initial dilution. The method described above isa modification of Gray's method (Gray, 1990).

A test serum control, which contains test serum plus bacterial cells andheat inactivated complement, is included for each individual serum. Thiscontrol can be used to assess whether the presence of antibiotics orother serum components are capable of killing the bacterial straindirectly (i.e. in the absence of complement or PMN's). A human serumwith known opsonic titer is used as a positive human serum control. Theopsonic antibody titer for each unknown serum can be calculated as thereciprocal of the initial dilution of serum giving 50% cfu reductioncompared to the control without serum.

A whole cell ELISA assay is also used to assess in vitro immunogenicityand surface exposure of the polypeptide antigen, wherein the bacterialstrain of interest (S. pneumoniae) is coated onto a plate, such as a 96well plate, and test sera from an immunized animal is reacted with thebacterial cells. If any antibody, specific for the test polypeptideantigen, is reactive with a surface exposed epitope of the polypeptideantigen, it can be detected by standard methods known to one skilled inthe art.

Any polypeptide demonstrating the desired in vitro activity is thentested in an in vivo animal challenge model. In certain embodiments,immunogenic compositions are used in the immunization of an animal(e.g., a mouse) by methods and routes of immunization known to those ofskill in the art (e.g., intranasal, parenteral, oral, rectal, vaginal,transdermal, intraperitoneal, intravenous, subcutaneous, etc.).Following immunization of the animal with a particular Streptococcuspneumoniae immunogenic composition, the animal is challenged withStreptococcus pneumoniae and assayed for resistance to Streptococcuspneumoniae infection.

In one embodiment, six-week old, pathogen-free, Balb/c mice areimmunized and challenged with Streptococcus pneumoniae. For example,BALB/C mice, at 10 animals per group, are immunized (by slowinstillation into the nostrils of each mouse) with one or more doses ofthe desired polypeptide in an immunogenic composition. Streptococcuspneumoniae colonizes the nasopharynx of Balb/c mice, but does not causedisease or death. Subsequently, the Balb/c mice are challenged withstreptomycin-resistant Streptococcus pneumoniae. The Balb/c mice aresacrificed post-challenge, the noses removed, and homogenized in sterilesaline. The homogenate is diluted in saline and plated onstreptomycin-containing TSA plates. Plates are incubated overnight at37° C. and then colonies are counted. Statistically significantreduction of nasopharyngeal colonization indicates that the polypeptideis suitable for use in human clinical trials.

In another embodiment, six-week old, pathogen-free, male CBA/CaHN xid/J(CBA/N) mice are immunized intranasally or parenterally prior toStreptococcus pneumoniae challenge. CBA/N mice, at 10 animals per group,are immunized with an appropriate amount of the desired polypeptide inan immunogenic composition to be tested. CBA/N mice are immunodeficient(XID) and, when challenged with appropriate Streptococcus pneumoniae,develop nasopharyngeal colonization, bacteremia and death.

The CBA/N mice are immunized intranasally or subcutaneously with one ormore doses of the desired immunogenic composition. Subsequently, theCBA/N mice are challenged with streptomycin-resistant Streptococcuspneumoniae. To determine the effects of immunization on intranasalcolonization, the CBA/N mice are sacrificed post-challenge, the nosesare removed, and homogenized in sterile saline. The homogenate isserially diluted in saline and plated on streptomycin-containing TSAplates. In addition, blood collected post-challenge from each mouse isalso plated on streptomycin-containing TSA plates to determine levels ofbacteremia. Plates are incubated overnight at 37° C. and then coloniesare counted. In another embodiment, CBA/N mice are immunized asdescribed above and challenged intranasally. The CBA/N mice are observeddaily after challenge, and the mortality is monitored for 14 days.Statistically significant reduction of nasopharyngeal colonizationand/or mortality indicates that the polypeptide is suitable for use inhuman clinical trials.

The Streptococcus pneumoniae polynucleotides, polypeptides, modulatorsof a Streptococcus pneumoniae polypeptides, and anti-Streptococcuspneumoniae antibodies (also referred to hereinafter as “activecompounds”) of the invention are incorporated into pharmaceutical andimmunogenic compositions suitable for administration to a subject, e.g.,a human. Such compositions typically comprise the nucleic acid molecule,protein, modulator, or antibody and a pharmaceutically acceptablecarrier. As used hereinafter the language “pharmaceutically acceptablecarrier” is intended to include any and all solvents, dispersion media,coatings, antibacterial and antifungal agents, isotonic and absorptiondelaying agents, and the like, compatible with pharmaceuticaladministration. The use of such media and agents for pharmaceuticallyactive substances is well known in the art. Except insofar as anyconventional media or agent is incompatible with the active compound,such media can be used in the compositions of the invention.Supplementary active compounds can also be incorporated into thecompositions.

A pharmaceutical or immunogenic composition of the invention isformulated to be compatible with its intended route of administration.Examples of routes of administration include parenteral (e.g.,intravenous, intradermal, subcutaneous, intraperitoneal), transmucosal(e.g., oral, rectal, intranasal, vaginal, respiratory) and transdermal(topical). Solutions or suspensions used for parenteral, intradermal, orsubcutaneous application can include the following components: a sterilediluent such as water for injection, saline solution, fixed oils,polyethylene glycols, glycerine, propylene glycol or other syntheticsolvents; antibacterial agents such as benzyl alcohol or methylparabens; antioxidants such as ascorbic acid or sodium bisulfite;chelating agents such as ethylenediaminetetraacetic acid; buffers suchas acetates, citrates or phosphates and agents for the adjustment oftonicity such as sodium chloride or dextrose. pH can be adjusted withacids or bases, such as hydrochloric acid or sodium hydroxide. Theparenteral preparation can be enclosed in ampoules, disposable syringesor multiple dose vials made of glass or plastic.

Pharmaceutical compositions suitable for injectable use include sterileaqueous solutions (where water soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersion. For intravenous administration, suitablecarriers include physiological saline, bacteriostatic water, CremophorELTM (BASF, Parsippany, N.J.) or phosphate buffered saline (PBS). In allcases, the composition must be sterile and should be fluid to the extentthat easy syringability exists. It must be stable under the conditionsof manufacture and storage and must be preserved against thecontaminating action of microorganisms such as bacteria and fungi. Thecarrier can be a solvent or dispersion medium containing, for example,water, ethanol, polyol (for example, glycerol, propylene glycol, andliquid polyetheylene glycol, and the like), and suitable mixturesthereof. The proper fluidity can be maintained, for example, by the useof a coating such as lecithin, by the maintenance of the requiredparticle size in the case of dispersion and by the use of surfactants.Prevention of the action of microorganisms can be achieved by variousantibacterial and antifungal agents, for example, parabens,chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In manycases, it will be preferable to include isotonic agents, for example,sugars, polyalcohols such as manitol, sorbitol, sodium chloride in thecomposition. Prolonged absorption of the injectable compositions can bebrought about by including in the composition an agent which delaysabsorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the activecompound (e.g., a Streptococcus pneumoniae polypeptide oranti-Streptococcus pneumoniae antibody) in the required amount in anappropriate solvent with one or a combination of ingredients enumeratedabove, as required, followed by filtered sterilization. Generally,dispersions are prepared by incorporating the active compound into asterile vehicle which contains a basic dispersion medium and therequired other ingredients from those enumerated above. In the case ofsterile powders for the preparation of sterile injectable solutions, thepreferred methods of preparation are vacuum drying and freeze-dryingwhich yields a powder of the active ingredient plus any additionaldesired ingredient from a previously sterile-filtered solution thereof.

Oral compositions generally include an inert diluent or an ediblecarrier. They can be enclosed in gelatin capsules or compressed intotablets. For the purpose of oral therapeutic administration, the activecompound can be incorporated with excipients and used in the form oftablets, troches, or capsules. Oral compositions can also be preparedusing a fluid carrier for use as a mouthwash, wherein the compound inthe fluid carrier is applied orally and swished and expectorated orswallowed. Pharmaceutically compatible binding agents, and/or adjuvantmaterials can be included as part of the composition. The tablets,pills, capsules, troches and the like can contain any of the followingingredients, or compounds of a similar nature: a binder such asmicrocrystalline cellulose, gum tragacanth or gelatin; an excipient suchas starch or lactose, a disintegrating agent such as alginic acid,Primogel, or corn starch; a lubricant such as magnesium stearate orSterotes; a glidant such as colloidal silicon dioxide; a sweeteningagent such as sucrose or saccharin; or a flavoring agent such aspeppermint, methyl salicylate, or orange flavoring.

For administration by inhalation, the compounds are delivered in theform of an aerosol spray from pressured container or dispenser whichcontains a suitable propellant, e.g., a gas such as carbon dioxide, or anebulizer. Systemic administration can also be by transmucosal ortransdermal means. For transmucosal or transdermal administration,penetrants appropriate to the barrier to be permeated are used in theformulation. Such penetrants are generally known in the art, andinclude, for example, for transmucosal administration, detergents, bilesalts, and fusidic acid derivatives. Transmucosal administration can beaccomplished through the use of nasal sprays or suppositories. Fortransdermal administration, the active compounds are formulated intoointments, salves, gels, or creams as generally known in the art.

The compounds can also be prepared in the form of suppositories (e.g.,with conventional suppository bases such as cocoa butter and otherglycerides) or retention enemas for rectal delivery.

In one embodiment, the active compounds are prepared with carriers thatwill protect the compound against rapid elimination from the body, suchas a controlled release formulation, including implants andmicroencapsulated delivery systems.

Biodegradable, biocompatible polymers can be used, such as ethylenevinyl acetate, polyanhydrides, polyglycolic acid, collagen,polyorthoesters, and polylactic acid. Methods for preparation of suchformulations will be apparent to those skilled in the art. The materialscan also be obtained commercially from Alza Corporation and NovaPharmaceuticals, Inc. Liposomal suspensions (including liposomestargeted to infected cells with monoclonal antibodies to viral antigens)can also be used as pharmaceutically acceptable carriers. These can beprepared according to methods known to those skilled in the art, forexample, as described in U.S. Pat. No. 4,522,811 which is incorporatedhereinafter by reference.

It is especially advantageous to formulate oral or parenteralcompositions in dosage unit form for ease of administration anduniformity of dosage. Dosage unit form as used hereinafter refers tophysically discrete units suited as unitary dosages for the subject tobe treated; each unit containing a predetermined quantity of activecompound calculated to produce the desired therapeutic effect inassociation with the required pharmaceutical carrier. The specificationfor the dosage unit forms of the invention are dictated by and directlydependent on the unique characteristics of the active compound and theparticular therapeutic effect to be achieved, and the limitationsinherent in the art of compounding such an active compound for thetreatment of individuals.

Combination immunogenic compositions are provided by including two ormore of the polypeptides of the invention, as well as by combining oneor more of the polypeptides of the invention with one or more known S.pyogenes polypeptides, including, but not limited to, the C5a peptidase,the M proteins, adhesins and the like.

In other embodiments, combination immunogenic compositions are providedby combining one or more of the polypeptides of the invention with oneor more known S. pneumoniae polysaccharides or polysaccharide-proteinconjugates, including, but not limited to, the currently available23-valent pneumococcal capsular polysaccharide vaccine and the 7-valentpneumococcal polysaccharide-protein conjugate vaccine.

The nucleic acid molecules of the invention are inserted into a varietyof vectors and expression systems. A great variety of expression systemsare used. Such systems include, among others, chromosomal, episomal andvirus-derived systems, e.g., vectors derived from bacterial plasmids,attenuated bacteria such as Salmonella (U.S. Pat. No. 4,837,151) frombacteriophage, from transposons, from yeast episomes, from insertionelements, from yeast chromosomal elements, from viruses such as vacciniaand other poxviruses, sindbis, adenovirus, baculoviruses, papovaviruses, such as SV40, fowl pox viruses, pseudorabies viruses andretroviruses, alphaviruses such as Venezuelan equine encephalitis virus(U.S. Pat. No. 5,643,576), nonsegmented negative-stranded RNA virusessuch as vesicular stomatitis virus (U.S. Pat. No. 6,168,943), andvectors derived from combinations thereof, such as those derived fromplasmid and bacteriophage genetic elements, such as cosmids andphagemids. The expression systems should include control regions thatregulate as well as engender expression, such as promoters and otherregulatory elements (such as a polyadenylation signal). Generally, anysystem or vector suitable to maintain, propagate or expresspolynucleotides to produce a polypeptide in a host may be used. Theappropriate nucleotide sequence may be inserted into an expressionsystem by any of a variety of well-known and routine techniques, suchas, for example, those set forth in Sambrook et al., “Molecular Cloning:A Laboratory Manual” 2nd, ed, Cold Spring Harbor Laboratory, Cold SpringHarbor Laboratory Press, Cold Spring Harbor, N.Y., 1989.

A pharmaceutically acceptable vehicle is understood to designate acompound or a combination of compounds entering into a pharmaceutical orimmunogenic composition which does not cause side effects and whichmakes it possible, for example, to facilitate the administration of theactive compound, to increase its life and/or its efficacy in the body,to increase its solubility in solution or alternatively to enhance itspreservation. These pharmaceutically acceptable vehicles are well knownand will be adapted by persons skilled in the art according to thenature and the mode of administration of the active compound chosen.

As defined hereinafter, an “adjuvant” is a substance that serves toenhance the immunogenicity of an “antigen” or the immunogeniccompositions comprising a polypeptide antigens having an amino acidsequence chosen from one of SEQ ID NO:216 through SEQ ID NO:430 or SEQID NO: 592 through SEQ ID NO: 752. Thus, adjuvants are often given toboost the immune response and are well known to the skilled artisan.Examples of adjuvants contemplated in the present invention include, butare not limited to, aluminum salts (alum) such as aluminum phosphate andaluminum hydroxide, Mycobacterium tuberculosis, Bordetella pertussis,bacterial lipopolysaccharides, aminoalkyl glucosamine phosphatecompounds (AGP), or derivatives or analogs thereof, which are availablefrom Corixa (Hamilton, Mont.), and which are described in U.S. Pat. No.6,113,918; one such AGP is2-[(R)-3-Tetradecanoyloxytetradecanoylamino]ethyl2-Deoxy-4-O-phosphono-3-O—[(R)-3-tetradecanoyoxytetradecanoyl]-2-[(R)-3-tetradecanoyoxytetradecanoylamino]-b-D-glucopyranoside,which is also known as 529 (formerly known as RC529), which isformulated as an aqueous form or as a stable emulsion, MPL™(3-O-deacylated monophosphoryl lipid A) (Corixa) described in U.S. Pat.No. 4,912,094, synthetic polynucleotides such as oligonucleotidescontaining a CpG motif (U.S. Pat. No. 6,207,646), polypeptides, saponinssuch as Quil A or STIMULON™ QS-21 (Antigenics, Framingham, Mass.),described in U.S. Pat. No. 5,057,540, a pertussis toxin (PT), or an E.coli heat-labile toxin (LT), particularly LT-K63, LT-R72, CT-5109,PT-K9/G129; see, e.g., International Patent Publication Nos. WO 93/13302and WO 92/19265, cholera toxin (either in a wild-type or mutant form,e.g., wherein the glutamic acid at amino acid position 29 is replaced byanother amino acid, preferably a histidine, in accordance with publishedInternational Patent Application number WO 00/18434). Various cytokinesand lymphokines are suitable for use as adjuvants. One such adjuvant isgranulocyte-macrophage colony stimulating factor (GM-CSF), which has anucleotide sequence as described in U.S. Pat. No. 5,078,996. A plasmidcontaining GM-CSF cDNA has been transformed into E. coli and has beendeposited with the American Type Culture Collection (ATCC), 1081University Boulevard, Manassas, Va. 20110-2209, under Accession Number39900. The cytokine Interleukin-12(IL-12) is another adjuvant which isdescribed in U.S. Pat. No. 5,723,127. Other cytokines or lymphokineshave been shown to have immune modulating activity, including, but notlimited to, the interleukins 1-alpha, 1-beta, 2, 4, 5, 6, 7, 8, 10, 13,14, 15, 16, 17 and 18, the interferons-alpha, beta and gamma,granulocyte colony stimulating factor, and the tumor necrosis factorsalpha and beta, and are suitable for use as adjuvants.

A composition of the present invention is typically administeredparenterally in dosage unit formulations containing standard, well-knownnontoxic physiologically acceptable carriers, adjuvants, and vehicles asdesired. The term parenteral as used hereinafter includes intravenous,intra-muscular, intraarterial injection, or infusion techniques.

Injectable preparations, for example sterile injectable aqueous oroleaginous suspensions, are formulated according to the known art usingsuitable dispersing or wetting agents and suspending agents. The sterileinjectable preparation can also be a sterile injectable solution orsuspension in a nontoxic parenterally acceptable diluent or solvent, forexample, as a solution in 1,3-butanediol.

Among the acceptable vehicles and solvents that may be employed arewater, Ringer's solution, and isotonic sodium chloride solution. Inaddition, sterile, fixed oils are conventionally employed as a solventor suspending medium. For this purpose any bland fixed oil can beemployed including synthetic mono- or di-glycerides. In addition, fattyacids such as oleic acid find use in the preparation of injectables.

Preferred carriers include neutral saline solutions buffered withphosphate, lactate, Tris, and the like. Of course, when administeringviral vectors, one purifies the vector sufficiently to render itessentially free of undesirable contaminants, such as defectiveinterfering adenovirus particles or endotoxins and other pyrogens suchthat it does not cause any untoward reactions in the individualreceiving the vector construct. A preferred means of purifying thevector involves the use of buoyant density gradients, such as cesiumchloride gradient centrifugation.

A carrier can also be a liposome. Means for using liposomes as deliveryvehicles are well known in the art (see, e.g. Gabizon et al., 1990;Ferruti et al., 1986; and Ranade, 1989).

The immunogenic compositions of this invention also comprise apolynucleotide sequence of this invention operatively associated with aregulatory sequence that controls gene expression. The polynucleotidesequence of interest is engineered into an expression vector, such as aplasmid, under the control of regulatory elements which will promoteexpression of the DNA, that is, promoter and/or enhancer elements. In apreferred embodiment, the human cytomegalovirus immediate-earlypromoter/enhancer is used (U.S. Pat. No. 5,168,062). The promoter may becell-specific and permit substantial transcription of the polynucleotideonly in predetermined cells.

The polynucleotide is introduced directly into the host either as“naked” DNA (U.S. Pat. No. 5,580,859) or formulated in compositions withagents which facilitate immunization, such as bupivicaine and otherlocal anesthetics (U.S. Pat. No. 5,593,972) and cationic polyamines(U.S. Pat. No. 6,127,170).

In this polynucleotide immunization procedure, the polypeptides of theinvention are expressed on a transient basis in vivo; no geneticmaterial is inserted or integrated into the chromosomes of the host.This procedure is to be distinguished from gene therapy, where the goalis to insert or integrate the genetic material of interest into thechromosome. An assay is used to confirm that the polynucleotidesadministered by immunization do not give rise to a transformed phenotypein the host (U.S. Pat. No. 6,168,918).

H. Uses and Methods of the Invention

The Streptococcus pneumoniae polynucleotides, polypeptides, polypeptidehomologues, modulators, adjuvants, and antibodies described in thisinvention can be used in methods of treatment, diagnostic assaysparticularly in disease identification, drug screening assays andmonitoring of effects during clinical trials. The isolatedpolynucleotides of the invention can be used to express Streptococcuspneumoniae polypeptides (e.g., via a recombinant expression vector in ahost cell or in polynucleotide immunization applications) and to detectStreptococcus pneumoniae mRNA (e.g., in a biological sample). Moreover,the anti-Streptococcus pneumoniae antibodies of the invention can beused to detect and isolate a Streptococcus pneumoniae polypeptide,particularly fragments of a Streptococcus pneumoniae polypeptidespresent in a biological sample, and to modulate Streptococcus pneumoniaepolypeptide activity.

The invention provides immunogenic compositions comprising polypeptideshaving an amino acid sequence chosen from one of SEQ ID NO:216 throughSEQ ID NO:430 or SEQ ID NO: 592 through SEQ ID NO: 752, a biologicalequivalent thereof or a fragment thereof. The immunogenic compositionmay further comprise a pharmaceutically acceptable carrier, as outlinedin section G. In certain preferred embodiments, the immunogeniccomposition will comprise one or more adjuvants.

In another embodiment, the invention provides immunogenic compositionscomprising a polynucleotide having a nucleotide sequence chosen from oneof SEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ IDNO: 591, wherein the polynucleotide is comprised in a recombinantexpression vector. Preferably the vector is plasmid DNA. Of course, thepolynucleotide may further comprise heterologous nucleotides, e.g., thepolynucleotide is operatively linked to one or more gene expressionregulatory elements, and further comprise one or more adjuvants. In apreferred embodiment, the immunogenic polynucleotide composition directsthe expression of a neutralizing epitope of Streptococcus pneumoniae.

Provided also are methods for immunizing a host against Streptococcuspneumoniae infection. In a preferred embodiment, the host is human.Thus, a host or subject is administered an immunizing amount of animmunogenic composition comprising a polypeptide having an amino acidsequence chosen from one of SEQ ID NO:216 through SEQ ID NO:430 or SEQID NO: 592 through 752, a biological equivalent thereof or a fragmentthereof and a pharmaceutically acceptable carrier. An immunizing amountof an immunogenic composition can be determined by doing a dose responsestudy in which subjects are immunized with gradually increasing amountsof the immunogenic composition and the immune response analyzed todetermine the optimal dosage. Starting points for the study can beinferred from immunization data in animal models. The dosage amount canvary depending upon specific conditions of the individual. The amountcan be determined in routine trials by means known to those skilled inthe art.

An immunologically effective amount of the immunogenic composition in anappropriate number of doses is administered to the subject to elicit animmune response. Immunologically effective amount, as used herein, meansthe administration of that amount to a mammalian host (preferablyhuman), either in a single dose or as part of a series of doses,sufficient to at least cause the immune system of the individual treatedto generate a response that reduces the clinical impact of the bacterialinfection. Protection may be conferred by a single dose of theimmunogenic composition or vaccine, or may require the administration ofseveral doses, in addition to booster doses at later times to maintainprotection. This may range from a minimal decrease in bacterial burdento prevention of the infection. Ideally, the treated individual will notexhibit the more serious clinical manifestations of the Streptococcuspneumoniae infection. The dosage amount can vary depending upon specificconditions of the individual, such as age and weight. This amount can bedetermined in routine trials by means known to those skilled in the art.

I. Diagnostic Assays

The invention provides methods for detecting the presence of aStreptococcus pneumoniae polypeptide or Streptococcus pneumoniaepolynucleotide, or fragment thereof, in a biological sample. The methodinvolves contacting the biological sample with a compound or an agentcapable of detecting a Streptococcus pneumoniae polypeptide or mRNA suchthat the presence of the Streptococcus pneumoniae polypeptide/encodingnucleic acid molecule is detected in the biological sample. A preferredagent for detecting Streptococcus pneumoniae mRNA or DNA is a labeled orlabelable oligonucleotide probe capable of hybridizing to Streptococcuspneumoniae mRNA or DNA. The nucleic acid probe can be, for example, afull-length Streptococcus pneumoniae polynucleotide of one of SEQ ID NO:1 through SEQ ID NO:215 or SEQ ID NO: 431 through SEQ ID NO: 591, acomplement thereof, or a fragment thereof, such as an oligonucleotide ofat least 15, 30, 50, 100, 250 or 500 nucleotides in length andsufficient to specifically hybridize under stringent conditions toStreptococcus pneumoniae mRNA or DNA. Alternatively, the sample can becontacted with an oligonucleotide primer of a Streptococcus pneumoniaepolynucleotide of one of SEQ ID NO: 1 through SEQ ID NO:215 or SEQ IDNO: 431 through SEQ ID NO: 591, a complement thereof, or a fragmentthereof, in the presence of nucleotides and a polymerase, underconditions permitting primer extension.

A preferred agent for detecting Streptococcus pneumoniae polypeptide isa labeled or labelable antibody capable of binding to a Streptococcuspneumoniae polypeptide. Antibodies can be polyclonal, or morepreferably, monoclonal. An intact antibody, or a fragment thereof (e.g.,Fab or F(ab′)2) can be used. The term “labeled or labelable,” withregard to the probe or antibody, is intended to encompass directlabeling of the probe or antibody by coupling (i.e., physically linking)a detectable substance to the probe or antibody, as well as indirectlabeling of the probe or antibody by reactivity with another reagentthat is directly labeled. Examples of indirect labeling includedetection of a primary antibody using a fluorescently labeled secondaryantibody and end-labeling of a DNA probe with biotin such that it can bedetected with fluorescently labeled streptavidin. The term “biologicalsample” is intended to include tissues, cells and biological fluidsisolated from a subject, as well as tissues, cells and fluids presentwithin a subject. That is, the detection method of the invention can beused to detect Streptococcus pneumoniae mRNA, DNA, or protein in abiological sample in vitro as well as in vivo. For example, in vitrotechniques for detection of Streptococcus pneumoniae mRNA includeNorthern hybridizations and in situ hybridizations. In vitro techniquesfor detection of Streptococcus pneumoniae polypeptide include enzymelinked immunosorbent assays (ELISAs), Western blots,immunoprecipitations and immunofluorescence. Alternatively,Streptococcus pneumoniae polypeptides can be detected in vivo in asubject by introducing into the subject a labeled anti-Streptococcuspneumoniae antibody. For example, the antibody can be labeled with aradioactive marker whose presence and location in a subject can bedetected by standard imaging techniques.

The polynucleotides according to the invention may also be used inanalytical DNA chips, which allow sequencing, the study of mutations andof the expression of genes, and which are currently of interest giventheir very small size and their high capacity in terms of number ofanalyses.

The principle of the operation of these chips is based on molecularprobes, most often oligonucleotides, which are attached onto aminiaturized surface, generally of the order of a few squarecentimeters. During an analysis, a sample containing fragments of atarget nucleic acid to be analysed, for example DNA or RNA labelled, forexample, after amplification, is deposited onto the DNA chip in whichthe support has been coated beforehand with probes. Bringing thelabelled target sequences into contact with the probes leads to theformation, through hybridization, of a duplex according to the rule ofpairing defined by J. D. Watson and F. Crick. After a washing step,analysis of the surface of the chip allows the effective hybridizationsto be located by means of the signals emitted by the labels tagging thetarget. A hybridization fingerprint results from this analysis which, byappropriate computer processing, will make it possible to determineinformation such as the presence of specific fragments in the sample,the determination of sequences and the presence of mutations.

The chip consists of a multitude of molecular probes, preciselyorganized or arrayed on a solid support whose surface is miniaturized.It is at the centre of a system where other elements (imaging system,microcomputer) allow the acquisition and interpretation of ahybridization fingerprint.

The hybridization supports are provided in the form of flat or poroussurfaces (pierced with wells) composed of various materials. The choiceof a support is determined by its physicochemical properties, or moreprecisely, by the relationship between the latter and the conditionsunder which the support will be placed during the synthesis or theattachment of the probes or during the use of the chip. It is thereforenecessary, before considering the use of a particular support, toconsider characteristics such as its stability to pH, its physicalstrength, its reactivity and its chemical stability as well as itscapacity to nonspecifically bind nucleic acids. Materials such as glass,silicon and polymers are commonly used. Their surface is, in a firststep, called “functionalization”, made reactive towards the groups whichit is desired to attach thereon. After the functionalization, so-calledspacer molecules are grafted onto the activated surface. Used asintermediates between the surface and the probe, these molecules ofvariable size render unimportant the surface properties of the supports,which often prove to be problematic for the synthesis or the attachmentof the probes and for the hybridization.

Among the hybridization supports, there may be mentioned glass which isused, for example, in the method of in situ synthesis ofoligonucleotides by photochemical addressing developed by the companyAffymetrix (E. L. Sheldon, 1993), the glass surface being activated bysilane. Genosensor Consortium (P. Mérel, 1994) also uses glass slidescarrying wells 3 mm apart, this support being activated withepoxysilane.

The probes according to the invention may be synthesized directly insitu on the supports of the DNA chips. This in situ synthesis may becarried out by photochemical addressing (developed by the companyAffymax (Amsterdam, Holland) and exploited industrially by itssubsidiary Affymetrix (United States), or based on the VLSIPS (verylarge scale immobilized polymer synthesis) technology (S. P. A. Fodor etal., 1991), which is based on a method of photochemically directedcombinatory synthesis. The principle of which combines solid-phasechemistry, the use of photolabile protecting groups andphotolithography.

The probes according to the invention may be attached to the DNA chipsin various ways such as electrochemical addressing, automated addressingor the use of probe printers (T. Livache et al., 1994; G. Yershov etal., 1996; J. Derisi et al., 1996, and S. Borman, 1996).

The revealing of the hybridization between the probes of the invention,deposited or synthesized in situ on the supports of the DNA chips, andthe sample to be analysed, may be determined, for example, bymeasurement of fluorescent signals, by radioactive counting or byelectronic detection.

The use of fluorescent molecules such as fluorescein constitutes themost common method of labelling the samples. It allows direct orindirect revealing of the hybridization and allows the use of variousfluorochromes.

Affymetrix currently provides an apparatus or a scanner designed to readits Gene Chip™ chips. It makes it possible to detect the hybridizationsby scanning the surface of the chip in confocal microscopy (R. J.Lipshutz et al., 1995).

The nucleotide sequences according to the invention may also be used inDNA chips to carry out the analysis of the expression of theStreptococcus pneumoniae genes. This analysis of the expression ofStreptococcus pneumoniae genes is based on the use of chips where probesof the invention, chosen for their specificity to characterize a givengene, are present (D. J. Lockhart et al., 1996; D. D. Shoemaker et al.,1996). For the methods of analysis of gene expression using the DNAchips, reference may, for example, be made to the methods described byD. J. Lockhart et al. (1996) and Sosnowsky et al. (1997) for thesynthesis of probes in situ or for the addressing and the attachment ofpreviously synthesized probes. The target sequences to be analysed arelabelled and in general fragmented into sequences of about 50 to 100nucleotides before being hybridized onto the chip. After washing asdescribed, for example, by D. J. Lockhart et al. (1996) and applicationof different electric fields (Sosnowsky et al., 1997), the labelledcompounds are detected and quantified, the hybridizations being carriedout at least in duplicate. Comparative analyses of the signalintensities obtained with respect to the same probe for differentsamples and/or for different probes with the same sample, determine thedifferential expression of RNA or copy numbers of DNA derived from thesample.

The nucleotide sequences according to the invention may, in addition, beused in DNA chips where other nucleotide probes specific for othermicroorganisms are also present, and may allow the carrying out of aserial test allowing rapid identification of the presence of amicroorganism in a sample.

Accordingly, the subject of the invention is also the nucleotidesequences according to the invention, characterized in that they areimmobilized on a support of a DNA chip.

The DNA chips, characterized in that they contain at least onenucleotide sequence according to the invention, immobilized on thesupport of the said chip, also form part of the invention.

The chips will preferably contain several probes or nucleotide sequencesof the invention of different length and/or corresponding to differentgenes so as to identify, with greater certainty, the specificity of thetarget sequences or the desired mutation in the sample to be analysed.

Accordingly, the analyses carried out by means of primers and/or probesaccording to the invention, immobilized on supports such as DNA chips,will make it possible, for example, to identify, in samples, mutationslinked to variations such as intraspecies variations. These variationsmay be correlated or associated with pathologies specific to the variantidentified and will make it possible to select the appropriatetreatment.

The invention thus comprises a DNA chip according to the invention,characterized in that it contains, in addition, at least one nucleotidesequence of a microorganism different from Streptococcus pneumoniae,immobilized on the support of the said chip; preferably, the differentmicroorganism will be chosen from an associated microorganism, abacterium of the Streptococcus family, and a variant of the speciesStreptococcus pneumoniae.

The principle of the DNA chip as explained above, may also be used toproduce protein “chips” on which the support has been coated with apolypeptide or an antibody according to the invention, or arraysthereof, in place of the DNA. These protein “chips” make it possible,for example, to analyse the biomolecular interactions (BIA) induced bythe affinity capture of target analytes onto a support coated, forexample, with proteins, by surface plasma resonance (SPR). Reference maybe made, for example, to the techniques for coupling proteins onto asolid support which are described in International Application EP 524800 or to the methods describing the use of biosensor-type protein chipssuch as the BIAcore-type technique (Pharmacia) (Arlinghaus et al., 1997,Krone et al., 1997, Chatelier et al., 1995). These polypeptides orantibodies according to the invention, capable of specifically bindingantibodies or polypeptides derived from the sample to be analysed, maythus be used in protein chips for the detection and/or theidentification of proteins in samples. The said protein chips may inparticular be used for infectious diagnosis and may preferably contain,per chip, several polypeptides and/or antibodies of the invention ofdifferent specificity, and/or polypeptides and/or antibodies capable ofrecognizing microorganisms different from Streptococcus pneumoniae.

Accordingly, the subject of the present invention is also thepolypeptides and the antibodies according to the invention,characterized in that they are immobilized on a support, in particularof a protein chip.

The protein chips, characterized in that they contain at least onepolypeptide or one antibody according to the invention immobilized onthe support of the said chip, also form part of the invention.

The invention comprises, in addition, a protein chip according to theinvention, characterized in that it contains, in addition, at least onepolypeptide of a microorganism different from Streptococcus pneumoniaeor at least one antibody directed against a compound of a microorganismdifferent from Streptococcus pneumoniae, immobilized on the support ofthe chip.

The invention also relates to a kit or set for the detection and/or theidentification of bacteria belonging to the species Streptococcuspneumoniae or to an associated microorganism, or for the detectionand/or the identification of a microorganism characterized in that itcomprises a protein chip according to the invention.

The present invention also provides a method for the detection and/orthe identification of bacteria belonging to the species Streptococcuspneumoniae or to an associated microorganism in a biological sample,characterized in that it uses a nucleotide sequence according to theinvention.

The invention also encompasses kits for detecting the presence of aStreptococcus pneumoniae polypeptide in a biological sample. Forexample, the kit comprises reagents such as a labeled or labelablecompound or agent capable of detecting Streptococcus pneumoniaepolypeptide or mRNA in a biological sample; means for determining theamount of Streptococcus pneumoniae polypeptide in the sample; and meansfor comparing the amount of Streptococcus pneumoniae polypeptide in thesample with a standard. The compound or agent is packaged in a suitablecontainer. The kit further comprises instructions for using the kit todetect Streptococcus pneumoniae mRNA or protein.

In certain embodiments, detection involves the use of a probe/primer ina polymerase chain reaction (PCR) (see, e.g. U.S. Pat. No. 4,683,195 andU.S. Pat. No. 4,683,202), such as anchor PCR or RACE PCR, or,alternatively, in a ligation chain reaction (LCR). This method includesthe steps of collecting a sample of cells from a patient, isolatingnucleic acid (e.g., genomic, mRNA or both) from the cells of the sample,contacting the nucleic acid sample with one or more primers whichspecifically hybridize to a Streptococcus pneumoniae polynucleotideunder conditions such that hybridization and amplification of theStreptococcus pneumoniae-polynucleotide (if present) occurs, anddetecting the presence or absence of an amplification product, ordetecting the size of the amplification product and comparing the lengthto a control sample.

All patents and publications cited herein are hereby incorporated byreference.

EXAMPLES

The following examples are carried out using standard techniques, whichare well known and routine to those of skill in the art, except whereotherwise described in detail.

The following examples are presented for illustrative purpose, andshould not be construed in any way limiting the scope of this invention.

Example 1 Bioinformatics and Gene Mining of Streptococcus pneumoniae

The genomic sequence of Streptococcus pneumoniae was downloaded from TheInstitute for Genomic Research (TIGR) website and novel open readingframes (ORFs) were determined in the following manner. An ORF wasdefined as having one of three potential start site codons, ATG, GTG orTTG and one of three potential stop codons, TAA, TAG or TGA. Theinventors used a unique set of two ORF finder algorithms: GLIMMER(Salzberg et al., 1998) and inventors' assignee's program to enhance theefficiency for finding “all” ORFs. In order to evaluate the accuracy ofthe ORFs determined, a program developed by inventors' assignee calledDiCTion was employed that uses a discrete mathematical cosine functionto assign a score for each ORF. An ORF with a DiCTion score >1.5 isconsidered to have a high probability of encoding a protein product. Theminimum length of an ORF predicted by the two ORF finding algorithms wasset to 225 nucleotides (including stop codon) which would encode aprotein of 74 amino acids. As a final search for remnants of ORFs, allnoncoding regions >75 nucleotides were searched against the publicprotein databases (described below) using tBLASTn. This helped toidentify regions of genes that contain frameshifts (Mejlhede et al.,1999) or fragments of genes that might have a role in causing antigenicvariation (Fraser et al., 1997). A graphical analysis program developedby inventors' assignee also allowed the inventors to see all six readingframes and the location of the predicted ORFs relative to the genomicsequence for further inspection. This helped to eliminate those ORFsthat have large overlaps with other ORFs, although there are known casesof ORFs being totally embedded within other ORFs (Loessner et al., 1999;Hernandez-Sanchez et al., 1998).

The initial annotation of the Streptococcus pneumoniae ORFs wasperformed using the BLAST (v. 2.0) Gapped search algorithm, Blastp, toidentify homologous sequences (Altschul et al., 1997). A cutoff ‘e’value of anything <e⁻¹⁰ was considered significant. Other searchalgorithms such as FASTA or PSI-BLAST were used as needed. Thenon-redundant protein sequence database used for the homology searchesconsisted of GenBank, SWISS-PROT (Bairoch and Apweiler, 2000), PIR(Barker et al., 2001), and TREMBL (Bairoch and Apweiler, 2000) databasesequences updated daily. ORFs with a Blastp result of >e⁻¹⁰ wereconsidered to be unique to Streptococcus pneumoniae.

A keyword search of the entire BLAST results was carried out using knownor suspected target genes for immunogenic compositions, as well as wordsthat identified the location of a protein or function.

Several parameters were used to determine grouping of the predictedproteins. Proteins destined for translocation across the cytoplasmicmembrane encode a leader signal (also called signal sequence) composedof a central hydrophobic region flanked at the N-terminus by positivelycharged residues (Pugsley, 1993). A program, called SignalP, identifiessignal peptides and their cleavage sites (Nielsen et al., 1997). Topredict protein localization in bacteria, the software PSORT has beenused (Nakai and Kanehisa, 1991). This program uses a neural netalgorithm to predict localization of proteins to the ‘cytoplasm’,‘periplasm’, and ‘cytoplasmic membrane’ for Gram-positive bacteria aswell as ‘outer membrane’ for Gram-negative bacteria. Transmembrane (TM)domains of proteins have been analyzed using the software programTopPred II (Cserzo et al., 1997).

The Hidden Markov Model (HMM) Pfam database of multiple alignments ofprotein domains or conserved protein regions (Sonnhammer et al., 1997)was used to identify Streptococcus pneumoniae proteins that may belongto an existing protein family. Keyword searching of this output was usedto help identify additional candidate ORFs that may have been missed bythe BLAST search criteria. A computer algorithm, called HMM Lipo, wasdeveloped by inventors' assignee to predict lipoproteins usingapproximately 131 biologically proven bacterial lipoproteins. Thistraining set was generated from experimentally proven prokaryoticlipoproteins. The protein sequence from the start of the protein to thecysteine amino acid plus the next two additional amino acids was used togenerate the HMM. Using approximately 70 known prokaryotic proteinscontaining the LPXTG cell wall sorting signal, a HMM (Eddy, 1996) wasdeveloped to predict cell wall proteins that are anchored to thepeptidoglycan layer (Mazmanian et al., 1999; Navarre and Schneewind,1999). The model used not only the LPXTG sequence but also included twofeatures of the downstream sequence, first the hydrophobic transmembranedomain and secondly, the positively charged carboxy terminus. There arealso a number of proteins that interact, non-covalently, with thepeptidoglycan layer and are distinct from the LPXTG protein classdescribed above. These proteins seem to have a consensus sequence attheir carboxy terminus (Koebnik, 1995). The inventors' assignee has alsodeveloped and used a HMM of this region to identify any Streptococcuspneumoniae that may fall into this class of proteins.

The proteins encoded by Streptococcus pneumoniae identified ORFs werealso evaluated for other useful characteristics. A tandem repeat finder(Benson, 1999) identified ORFs containing repeated DNA sequences such asthose found in MSCRAMMs (Foster and Hook, 1998) and phase variablesurface proteins of Neisseria meningitidis (Parkhill et al., 2000).Proteins that contain the Arg-Gly-Asp (RGD) attachment motif, togetherwith integrins that serve as their receptor, constitute a majorrecognition system for cell adhesion. RGD recognition is one mechanismused by microbes to gain entry into eukaryotic tissues (Stockbauer etal., 1999; Isberg and Tran Van Nhieu, 1994). However, not all RGDcontaining proteins mediate cell attachment.

It has been shown that RGD containing peptides with a proline at thecarboxy end (RGDP) are inactive in cell attachment assays (Pierschbacherand Ruoslahti, 1987) and are excluded. The Geanfammer software was usedto cluster proteins into homologous families (Park and Teichmann, 1998).Preliminary analysis of the family classes has provided novel ORFswithin a candidate cluster as well as defining potential proteinfunction.

Example 2 Cloning, Expression and Analysis of Predicted ORF ProteinsMaterials and Methods

Growth of Streptococcus pneumoniae.

Streptococcus pneumoniae were grown in Todd Hewitt broth (Difco)supplemented with 0.5% yeast extract. Bacteria were incubated at 35° C.in 5% CO₂ without shaking. Mid-log phase cultures (OD₅₅₀ approx 0.3)were harvested after approximately 4 hours incubation and cells pelletedby centrifugation (5,000×g) at 4° C.

Cloning and Expression of Predicted ORFs.

The predicted ORFs were cloned and expressed in E. coli Top10 orBLR(DE3). Expression of each ORF was tested in both pBAD/Thio-TOPO(which contains an arabinose inducible promoter) and pCR-T7/NT-TOPOexpression systems (Invitrogen, Carlsbad, Calif.). Gene specific primerswere designed to amplify, by polymerase chains reaction (PCR), eachselected ORF from Streptococcus pneumoniae CP1200 (Morrison et al.,1983) genomic DNA purified using the Wizard Genomic DNA purification kit(Promega, Madison, Wis.). The 5′ primers were designed to exclude thepredicted signal sequence (as predicted by SignalP) and the 3′ primerwas designed to either include the stop codon (pCR-T7) or exclude thestop codon (pBAD). ORFs were amplified in a standard polymerase chainreaction (200 μM each dNTP (Invitrogen), 200 μM each 5′ and 3′ genespecific primer, 1 μL stock of chromosomal DNA, 2.5 U Pfu Turbopolymerase (Stratagene, LaJolla, Calif.) and 1×Pfu Turbo reaction bufferin a total volume of 50 μL). Overhanging A's were added to the PCRproducts by incubation for 10 minutes at 72° C. with 1 U of Taq DNApolymerase (Roche Diagnostics, Indianapolis, Ind.). PCR products werecloned into the expression vectors and transformed into E. coli TOP10following manufacturer's TOPO-TA cloning protocol (Invitrogen). Positiveclones were identified by PCR using one gene specific primer and onevector specific primer to ensure correct orientation.

ORFs cloned into pCR-T7 were transformed into E. coli BL21(DE3) forprotein expression using the T7 promoter and those cloned into pBAD werekept in TOP10. Protein expression was determined by growing overnightcultures of the positive clones in 2 mL HySoy broth (DMV InternationalNutritional, Fraser, N.Y.) supplemented with 100 μg/mL ampicillin. Thesecultures were then diluted 1:100 into fresh media and grown untilOD₆₀₀=1.0. Protein expression was induced with either 2% arabinose(pBAD) or 0.1 mM IPTG (pCRT7). Three hours post-induction, the cellswere harvested and protein expression determined by Western blotanalysis of whole-cell lysates using either anti-express epitope (pCRT7)or anti-thio (pBAD) antibodies. The best expressing clone (pBAD orpCRT7) was used for protein production and purification.

Fourteen of the ORFs that did not express in either pCRT7 or pBAD werecloned into pET27b(+) (Novagen, Madison, Wis.). The ORFs were againamplified by PCR and cloned using standard molecular biology techniquesinto the NcoI and XhoI sites of pET27b(+). Clones were again screened byPCR, and plasmids with the correct insert were transformed intoBL21(DE3) and expression tested as described for pCR-T7. Proteinexpression was determined by Western blot analysis using anti-HSVepitope antibody.

Purification of Soluble His-Tag ORF Proteins.

Protein was expressed from positive clones in 4×μL of media as describedabove. Cells were harvested by centrifugation, resuspended in 100 mL ofNi Buffer A (20 mM Tris, pH 7.5, 150 mM NaCl) and lysed by 2 passagesthrough a French pressure cell at 16,000 psi (SLM Instruments, Inc.,Rochester, N.Y.).

For soluble proteins, the cell debris was pelleted by centrifugation at˜9,000×g and the supernatant was loaded onto an iminodiaceticacidsepharose 6B (Sigma Chemical, St. Louis, Mo.) column charged with Ni²⁺.Unbound proteins were washed from the column with Ni buffer A until A₂₈₀of eluate reached a baseline. The bound protein was then eluted with Nibuffer A containing 300 mM imidazole (Sigma Chemical). Purity wasestimated by SDS-PAGE.

Samples requiring further purification were concentrated and bufferexchanged over a PD-10 column (Amersham-Pharmacia Biotech, Piscataway,N.J.) equilibrated with buffer A (20 mM Tris, pH 8.0). The eluate wasloaded onto a Q-sepharose High Performance (Amersham-Pharmacia Biotech)column and eluted with a 0-35% Buffer B (20 mM Tris, pH 8.0, 1M NaCl)gradient. Protein-containing fractions were determined by SDS-PAGE. Allprotein purification was done using an AKTA Explorer (Amersham-PharmaciaBiotech).

Isolation and Solubilization of Insoluble His-Tag Fusion Proteins.

Bacterial cell pellets were suspended at a ratio of 5:1 (buffervolume:pellet wet weight) in 10 mM NaPO₄/150 mM NaCl/pH 7.0 withComplete Protease Inhibitor Cocktail containing EDTA (Roche DiagnosticsGmbH, Mannheim, Germany). The cells were disrupted using aMicrofluidizer (Microfluidics Corp., Newton, Mass.) and centrifuged at21,900×g for 30 minutes at 4° C. The pellet, containing insolubleHis-tag proteins, was subjected to a series of detergent extractionsfollowed by a final solubilization step using 6M urea. The pellet wasresuspended in 10 mM NaPO₄/150 mM NaCl/pH 7.0 containing CompleteProtease Inhibitor Cocktail and 1.0% Triton X-100 (TX-100) using thesame 5:1 ratio described above. The suspension was stirred at 4° C. for30 minutes and centrifuged at 21,900×g for 20 minutes at 4° C. Thesupernatant was removed and stored at 4° C. for further analysis. Thepellet was subjected to a second TX-100 extraction, as described, andthe supernatant removed and stored at 4° C. for further analysis. TheTX-100 pellet was then resuspended in 10 mM NaPO₄/150 mM NaCl/pH 7.0containing Complete Protease Inhibitor Cocktail and 1.0% Zwittergent3-14 (Z3-14) and stirred at 4° C. for a minimum of 1 hour. Thesuspension was centrifuged at 21,900×g for 20 minutes at 4° C. Thesupernatant was removed and stored at 4° C. for further analysis. TheZ3-14 pellet was resuspended in 100 mM Tris-HCl/6M urea/pH 8.0 andstirred a minimum of 4 hours at room temperature. The suspension wascentrifuged at 21,900×g for 20 minutes at 4° C. and the supernatantstored at 4° C. for further analysis.

Purification of Solubilized His-Tag Fusion Proteins.

Isolated extracts containing His-tag fusion proteins were identified asdescribed by SDS-PAGE and/or Western blot analysis. Chromatography wascarried out using POROS MC 20 micron metal chelate Ni²⁺ media(Perseptive Biosystems, Framingham, Mass.) prepared according to themanufacturer. Protein extracts were loaded at approximately 5-10 mg oftotal protein per mL of column media.

For preparations in which the His-tag proteins were soluble in eitherthe cytosolic fraction or detergent extractions by TX-100 or Z3-14, thematerial was applied directly to a MC 20 column equilibrated with aminimum of 3 column volumes of 10 mM NaPO₄/150 mM NaCl/pH 7.0 forcytosolic proteins, or the same buffer containing either 1.0% TX-100 or1.0% Z3-14 for proteins isolated in the TX-100 and Z3-14 extractionsrespectively. For cytosolic material, unbound proteins were washedthrough the column with a minimum of 5 column volumes of equilibrationbuffer. For TX-100 or Z3-14 containing extracts, unbound proteins werewashed through the column with equilibration buffer containing either0.05% TX-100 or Z3-14, depending on the solubility characteristics ofthe particular protein. His-tag fusion proteins were eluted using a stepgradient of 2 column volumes each of 25 mM, 50 mM, 125 mM, and 250 mMimidazole in 10 mM NaPO₄/150 mM NaCl/pH 7.0 containing either 0.05%TX-100 or 0.05% Z3-14. Fractions containing His-tag protein wereidentified by SDS-PAGE and pooled. Imidazole was removed by dialysisinto an appropriate buffer. Protein concentration was determined by BCAassay (Pierce) and, if necessary, preparations were concentrated byeither ultrafiltration using Centriprep YM-10 membranes (Millipore,Bedford, Mass.) or by applying the material to a smaller MC 20 column,under the conditions described, and eluting with 250 mM imidazolefollowed by dialysis. Protein purity was estimated by SDS-PAGE andscanning densitometry.

For preparations in which urea was used to denature and solubilize theprotein, the material was diluted 3 fold with 100 mM Tris-HCl/0.05%TX-100/pH 7.5 to give a final urea concentration of 2 M. The materialwas applied to a MC 20 column equilibrated with a minimum of 3 columnvolumes of 100 mM Tris-HCl/0.05% TX-100/2 M urea/pH 7.5 and unboundproteins were washed through the column with a minimum of 5 columnvolumes of equilibration buffer. His-tag fusion proteins were elutedusing a step gradient of 2 column volumes each of 25 mM, 50 mM, 125 mM,and 250 mM imidazole in 100 mM Tris-HCl/0.05% TX-100/2 M urea pH 7.5.Fractions containing His-tag protein were identified by SDS-PAGE andpooled. Imidazole and urea were removed, and the protein refolded bydialysis into an appropriate buffer containing 0.05% TX-100. Ifnecessary, preparations were concentrated by either ultrafiltrationusing Centriprep YM-10 membranes (Millipore, Bedford, Mass.) or byapplying the material to a smaller MC 20 column, under the conditionsdescribed, and eluting with 250 mM imidazole followed by dialysis.Protein purity was estimated by SDS-PAGE and scanning densitometry.

SDS-PAGE & Western Analysis.

SDS-PAGE was carried out as described by Laemmli (Laemmli, 1970), using10-20% (wt/vol) gradient acrylamide gels (Z-axis, Hudson, Ohio).Proteins were visualized by staining the gels with Simply Blue Safestain(Invitrogen Life Technologies, Carlsbad, Calif.). The gels were scannedwith a Personal Densitometer SI (Molecular Dynamics Inc., Sunnyvale,Calif.) and purities were estimated using the Image Quant software(Molecular Dynamics Inc.).

Transfer of proteins to polyvinylidene difluoride (PVDF) membranes wasaccomplished with a semidry electroblotter and electroblot buffers (OwlSeparation Systems, Portsmouth, N.H.). The PVDF membrane, containing thetransferred protein, was blocked with 5% non-fat dry milk prepared inPBS (Blotto) for 30 minutes. The membrane was then probed with one ofthe following primary antibody preparations at the indicated dilutionspecific for the individual protein expression system: Invitrogenanti-Xpress (1:5000), Invitrogen anti-thioredoxin (1:2000), Novagenanti-HSV epitope (1:5000), Qiagen anti-4×His (1:5000). The membrane wasthen washed with Blotto followed by Goat anti-mouse alkaline phosphataseconjugate (1:1500) as the secondary antibody (Biosource International,Camarillo, Calif.). Western blots were developed with5-bromo-4-chloro-indolylphosphate-nitroblue tetrazolium (BCIP/NBT)phosphatase substrate system (Kirkegaard and Perry Laboratories,Gaithersburg, Md.).

Protein Quantitation.

Protein concentrations were estimated by the bicinchoninic assay(Pierce, Rockford, Ill.) with bovine serum albumin as the standard.

Production of Anti-ORF Sera in Mice.

Female Swiss Webster mice (Taconic Farms, Germantown, N.Y.) with ages 6to 8 weeks old were immunized subcutaneously in the neck at weeks 0, 4,and 6 weeks with purified His tag protein. Two separate immunogeniccompositions were prepared with each His-tag protein. One immunogeniccomposition was prepared with the protein formulated with STIMULON™QS-21 and a second was prepared with the protein formulated with MPL™.Each dose for one group of mice contained 10 μg of purified protein and20 μg STIMULON™ QS-21, while each dose for the second group of micecontained 10 μg of the same protein and 50 μg MPL™. Serum samples werecollected at weeks 0, 4, 6 and 8. Mice were housed in aspecific-pathogen free facility and provided water and food ad-libitum.

Pneumococcal Whole-Cell ELISAs.

Streptococcus pneumoniae strains, either type 3 or type 14, were grownin Todd Hewitt broth (Difco) containing 100 μg/ml streptomycin at 35° C.without shaking. The bacteria were grown to mid-log phase (OD₅₅₀<1.0),and heat inactivated for 1 hour at 60° C. Bacteria were pelleted at10,000×g and resuspended in PBS to an OD₅₅₀=0.1. Fifty-five μl of thissuspension was then added to each well of 96-well Nunc plates and airdried at room temperature. Plates were stored at 4° C. until used.

Wells were blocked with 150 μl/well of PBS containing 5% (wt/vol) drymilk (blocking buffer) for 1 hour. Wells were washed 5 times with PBS ina Skantron washer, and mouse sera diluted in blocking buffer (100μl/well) added. Plates were incubated at room temperature for 2 hoursand unbound antibodies removed by washing 5 times with PBS in a Skantronwasher. Bound antibodies were detected with 100 μl/well ofperoxidase-labeled goat anti-mouse IgG (1:1,000 dilution of 1 mg/ml inPBS; KPL) at room temperature for 2 hours. Plates were washed with PBSas above, and developed with 100 μl/well ABTS (KPL) for 25 minutes atroom temperature. The reactions were stopped with 100 μl/well of 1% SDSand the OD₄₀₅ of each well read on a VERSAmax microplate reader(Molecular Devices Corp., Sunnyvale, Calif.). Endpoint titers of eachtest serum were calculated as the inverse of the highest mean dilutiongiving an OD₄₀₅=0.1.

FACS Analysis of Streptococcus pneumoniae.

Strains type 3 and 19F were grown in Todd-Hewitt broth+0.5% yeastextract from frozen stocks of OD₆₀₀-1.0 cells. Incubation was at 37° C.for 3 to 4 hours without shaking. 2-3×10⁷ cells, 100 μl of OD₆₀₀=0.5 fortype3, and 50 μl for 19F, were pipetted into a 96-well microtiter plateand spun at 4000 rpm in an Eppendorf tabletop centrifuge for 5 minutes.Supernatant was aspirated and cells were resuspended in 95 μl PBS-0.5%BSA-0.1% gelatin. Five μl primary antibody was added, mixed and leftincubating on ice for 1 hour. Cells were pelleted as before, washedtwice with 100 μl buffer and resuspended in 99 μl buffer. One μl goatanti-mouse secondary antibody conjugated to Alexa Fluor 488 (MolecularProbes, Eugene, Oreg.) was added to the samples, mixed and leftincubating on ice for 30 minutes. Cells were washed as before andresuspended in 100 μl buffer. Before analyzing on the FACSVantageSEunit, samples were diluted to 1 ml with buffer. Samples were read on aBecton Dickinson FACSVantage unit with an Enterprise II laser.Excitation was at 488 nm and emission was detected with aphotomultiplier tube using a 530/30 filter. Week 0 antisera were run asbackground control for the week 8 antisera.

Comparison of Message from Cells Grown In Vitro and In Vivo.

Messenger RNA (mRNA) levels for specific transcripts can be examined bycreating a double stranded cDNA from the mRNA using reversetranscriptase. This cDNA is then amplified using standard PCRconditions. The resulting amplification products are thus indicative ofthe message produced. This technique is useful for comparing theexpression of specific transcripts under varying environmentalconditions, such as growth in culture flasks versus growth in vivo.

Preparation of RNA from Cells Grown In Vitro.

In vitro grown Streptococcus pneumoniae serotypes were grown to logphase in 60 ml THB −0.5% YE at 37° C. with 5% CO₂. Bacterial cells wereharvested by centrifugation at 1000×g for 15 minutes at 4° C. Thesupernatant was aspirated and the cells were resuspended in 1 mlRNAlater (Ambion, Austin, Tex.) and stored for >1 hour at 4° C. Thecells were then centrifuged in a microfuge for 5 minutes at 8000×g. Thesupernatant was aspirated and the cells were resuspended in 100 μl 10%deoxycholate (DOC). 1100 μl of RNAZOL B (Tel-Test, Inc) were then addedand the suspension mixed briefly by inversion. 120 μl of CHCl₃ were thenadded, the sample mixed by inversion and then centrifuged in a microfugeat full speed for 10 minutes at 4° C. The aqueous layer was removed andthe RNA was precipitated by addition of an equal volume of 2-propanol.The RNA was incubated at 4° C. for >1 hour and then centrifuged in amicrofuge at full speed for 10 minutes at room temperature. Thesupernatant was aspirated and the RNA was washed with 75% ETOH andrecentrifuged for 5 minutes. The supernatant was aspirated and the RNAwas resuspended in 50-100 μl nuclease-free water. DNA was removed fromthe RNA by treating the sample with RNAse-free DNAase (DNA FREE, Ambion)for 20 minutes at 37° C., followed by inactivation of the enzyme byaddition of the DNA FREE chelator. The purity and yield of the RNA wasassessed by measuring the absorbance at 260 nm and 280 nm. Absorbanceratios were typically 1.9-2.0. RNA was stored at −70° C.

Preparation of RNA from Cells Grown In Vivo.

In vivo grown Streptococcus pneumoniae serotypes were harvested fromsealed dialysis tubing incubated in the peritoneal cavities ofSprague-Dawley rats as described by Orihuela et al. (2000). Log phaseStreptococcus pneumoniae cells were prepared as described above andresuspended to 10⁶ cfu/ml in RPMI media (Celltech) supplemented with0.4% glucose. One ml of the cell suspension was sealed in a PVDFdialysis membrane with a 80,000 M_(w) cutoff (SprectraPor). Two suchbags were implanted intraperitoneally in 400 g Sprague Dawley rats(Taconic). The bags remained in the rats for 22 hours, after which therats were terminated and the bags were harvested. RNA was prepared fromthe intraperitoneally grown cells as described above.

RT-PCR to Examine Message Levels.

Specific message for each candidate gene was amplified out from RNAprepared from both in vitro and in vivo grown cells using RT-PCR. Foreach reaction, 0.5 μg RNA was incubated with 0.25 μM of the reversemining primer for 3 minutes at 75° C., then cooled on ice andtransferred to 44° C. The message was reverse transcribed using theRETROscript (Ambion) kit according to the manufacturer's directions.ReddyMix (ABgene) was used according to the manufacturer's directions toamplify each message from 2-5 μl of the sample, using 0.25 μM of theabove reverse primer and the forward mining primer. Followingamplification, 10 μl of the amplified product was electrophoresed on a1% agarose gel.

Results

Cloning of ORFs into Expression Vectors.

Fifty-nine ORFs were selected for cloning and expression based onprediction of surface exposure from genomic analysis as described above.These ORFs were amplified by PCR and cloned into the expression vectorsas described in Materials and Methods. The ORFs were cloned intopBAD/Thio-TOPO and pCR-T7/NT-TOPO. Both vectors fuse a hexahistidine tagand a unique epitope to facilitate purification and identification bywestern blot respectively. The pBAD vector also fuses a thioredoxinmoiety to the cloned protein to enhance solubility.

Expression of ORFs in E. coli.

The genes encoding all 59 ORFs were induced in the appropriate host E.coli strains and examined for expression by SDS-PAGE and western blotanalysis of whole cell extracts. Of the 59 ORFs, a total of 24 (41%)were expressed at detectable levels. Fourteen of the ORFs that did notexpress in either of the expression vectors were cloned into pET27b(+)which fuses a hexahistidine tag to the C-terminus and a PeIB leadersequence at the N-terminus of the protein. One of the 14 ORFs clonedinto pET27b(+) expressed protein.

Purification of Expressed ORF Proteins.

All of the expressed ORFs contained a 6×His motif to aid inpurification. Initial purification of all of the proteins was done usinga Ni containing resin according to manufacturer's directions. Twenty ofthe expressed ORF proteins were purified to acceptable levels ofhomogeneity for immunization studies using this affinity purification(Table 17). Specific purification conditions used are detailed inMaterials and Methods and in Table 17. Thirteen of the 20 ORF proteinswere used to immunize mice and obtain antisera specific for theexpressed protein.

TABLE 17 Purification of Expressed S. pneumoniae ORF Proteins Total“PSORT” [Protein] Protein PREDICTED Location ORF # mg/ml mg Purity %Final Buffer Location in E. coli 75 0.52 6.8   94% PBS/1 mM EDTA OuterCytosol pH 7.4 membrane 2615 0.42 16.8   80% PBS/1 mM EDTA Outer CytosolpH 7.4 membrane 3039 0.53 2.91   82% 0.1MTris/150 mM Outer Inclusion(0.14) NaCl/ membrane Bodies 0.05% Zw3- 14/1 mM EDTA pH 8.0 1143 1.4 196  92% PBS/0.05% tx- Inner Inclusion 100/ membrane Bodies 1 mM EDTA pH7.4 1835 0.5 10.5 91.3%  PBS/0.05% tx- Inner Inclusion (0.2) 100/membrane Bodies 1 mM EDTA pH 7.4 1568 1.0 5.0 >85% PBS/0.05% tx- InnerInclusion 100/ membrane Bodies 1 mM EDTA pH 7.4 2271 4.9 122.5 >90% PBS,pH 7.4 Inner Cytosol Membrane 2621 1.5 4.5 >90% PBS, pH 7.4 InnerCytosol Membrane 1104 2.0 — 85-90%   PBS, pH 7.4 Outer Cytosol Memberane935 0.1 .5   85% 50 mM Glycine- Outer Inclusion NaOH/150 mM membraneBodies NaCl/ 0.05% Z3-14 pH 10.0 3361 1.67 3.34   98% PBS/1 mM EDTAInner Cytosol pH 7.4 membrane 339 0.91 127.4 93.2%  PBS/0.05% tx- InnerInclusion (0.91) (27.3) (80.8%)  100/ Membrane Bodies 1 mM EDTA pH 7.42322 0.55 2.5   90% BS/0.05% tx-100/ Inner Inclusion (0.23) (0.92) 1 mMEDTA pH Membrane Bodies 7.4 1476 1.2 9.6 >80% PBS/0.05% tx- InnerInclusion (0.6) 100/ Membrane Bodies 1 mM EDTA pH 7.4 3115 0.2 2.8 >85%PBS/0.05% tx- Inner Inclusion (0.5) 100/ Membrane Bodies 1 mM EDTA pH7.4 132 4.6 460   95% PBS pH 7.4 — Cytosol 3386 3.1 27   85% PBS pH 7.4Inner Cytosol Membrane 2112 0.6 1.8   85% PBS pH 7.4 Inner CytosolMembrane 916 0.26 1.3 >85% PBS 0.05% Tx- — Inclusion 100 Bodies pH 7.43373 0.97 1.9   84% PBS 0.05% Z3- Inner Inclusion 14 Membrane Bodies pH7.4

Expression of ORF Proteins in Streptococcus pneumoniae Whole CellLysates.

To determine if the ORFs are being expressed in Streptococcuspneumoniae, whole cell lysates of in vitro grown cells were probed withthe antisera in Western blot analysis. Each antiserum was reactive withthe purified recombinant protein as a positive control (data not shown).Whole cell lysates from Streptococcus pneumoniae strains type 3, type14, and type 19F were examined in Western blot, and the results aresummarized in Table 18. Proteins from three of the ORFs wereundetectable or barely detectable in all of the strains tested. Proteinsfrom eight of the ORFs were expressed in at least 2 of the strains,while proteins from two ORFs were detected in only one of the threestrains examined. These results demonstrate that the majority of theproteins from these ORFs were expressed in late log, early stationaryphase Streptococcus pneumoniae, and that some strains may not expressdetectable amounts of each ORF at the time point examined.

TABLE 18 Whole Cell ELISA and Western Blot Expression Data for S.pneumoniae ORFs Western Blot Whole Cell Expression In vitro FACS VaccineAdjuvant ELISA Type Type Analysis (10 μg) (20 μg) Type 3 Type 14 Type 314 19F Type 3 Type 19F 2615 QS21 <200 <200 − − − − − 3039 QS21 <200<200 + ++ ++ − −  75 QS21 256 <200 +++ +++ +++ + − 1568 QS21 4,018 <200++ +++ +++ − − 1143 QS21 779 <200 + ++ + + − 1835 QS21 202 <200 − +/−− + − 2271 QS21 442 <200 +++ +++ +++ + − 2621 QS21 739 <200 ++ + − ++ −1104 QS21 409 <200 +++ +++ +++ + −  339 QS21 <200 <200 − +/− − − ND 2322QS21 <200 <200 − − +/− − ND 3361 QS21 <200 <200 − + + + ND  935 QS21<200 <200 − − − − ND Standard ~45,000 ~10,000 ND ND ND

Surface Exposure of ORF Proteins: Whole Cell ELISA.

The 13 antisera against the recombinant ORF proteins were tested forsurface reactivity by whole cell ELISA against two strains ofStreptococcus pneumoniae, type 3 and type 14. The results are shown inTable 18. Seven of the 13 antisera gave detectable whole cell titersagainst type 3 Streptococcus pneumoniae, while none of them gavedetectable titers against the type 14 strain. When anticapsular serumwas tested against the homologous capsular serotype, the titer againstthe type 14 strain was much lower than that against the type 3 strain(see row labeled “standard” in Table 18). This result indicated thatthere might have been sensitivity issues with the type 14 whole cellELISA, because the Western blot data clearly demonstrate that type 14Streptococcus pneumoniae do express the majority of the proteins of theORFs (Table 18). The whole cell ELISA titers of antiserum against theproteins of ORF 75 (SEQ ID NO:218), ORF 1104 (SEQ ID NO:282), ORF 2621(SEQ ID NO:363), ORF 1568 (SEQ ID NO:306), ORF 1143 (SEQ ID NO:285), ORF2271 (SEQ ID NO:343), and ORF 1835 (SEQ ID NO:315) ranged from slightlyabove background to 20 times above background. These results indicatethat these antisera detect at least some surface exposed epitopes forthese ORFs.

Surface Exposure of ORF Proteins: FACS Analysis.

The polyclonal antisera against the proteins from ORFs 2615, 3039, 75,1568, 1143, 1835, 2271, 2621, 1104, 339, 2322, 3361 and 935, wereanalyzed for surface reactivity with whole Streptococcus pneumoniaecells by FACS analysis as described above. The results of the analysesare shown in Table 18. Streptococcus pneumoniae type 3 cells showed a9-fold increase in geometric mean fluorescence intensity when labeledwith antiserum to ORF 2621 (SEQ ID NO:363). A less intense fluorescenceintensity was detected with antisera directed against the proteins ofORF 1835 (SEQ ID NO:315), ORF 2271 (SEQ ID NO:343), ORF 75 (SEQ IDNO:218), ORF 1143 (SEQ ID NO:285), and ORF 1104 (SEQ ID NO:282). Nine ofthe antisera tested did not show any detectable surface reactivity withthe Streptococcus pneumoniae type 19F strain. This may be due to thelevel of sensitivity of the technique or the capsule of 19F covering thesurface exposed proteins more completely under the conditions tested.

Analysis of ORF mRNA Expression In Vitro vs. In Vivo.

Forward and reverse mining primers were used to amplify the full lengthmessage for several ORFs, identified by mining algorithms as potentialvaccine antigens (Example 1), from type 3 and type 14 cells grown underin vitro and in vivo conditions. In three of the four ORFs examined,message was detected in both in vitro and in vivo grown cells. For ORFs1104 (SEQ ID NO:282) and 1568 (SEQ ID NO:306), the detection of messagecorrelated with the presence of an immunoreactive band on a Western blotof whole cell lysates for the same serotypes. However for ORF 2322 (SEQID NO:345), message was detected in both serotype 3 and 14, but noimmunoreactive band was present for those serotypes, indicating thateither the protein was secreted or that the antibodies generated by therecombinant protein did not recognize the native protein. No message wasdetected for ORF 935 (SEQ ID NO:265) in either growth condition, whichcorrelates with the absence of an immunoreactive band on a Western blot.In a separate experiment, message of the expected size was detected fromRNA made from serotype 14 grown in vitro for ORFs 1143 (SEQ ID NO:285),1475 (SEQ ID NO:298), 3039 (SEQ ID NO:380), 2271 (SEQ ID NO:343), 3115(SEQ ID NO:388) and 3361 (SEQ ID NO:402)(data not shown).

Discussion

Prediction of surface exposure is a critical step for genomic miningefforts for identifying candidate antigens. The algorithms utilizedherein have been shown in the past to have predictive value forselecting candidate ORFs to examine. The results shown here demonstratethe utility of the algorithms for Streptococcus pneumoniae and that theyrepresent an advance over the previously utilized algorithms. Here, 7out of 13 proteins from ORFs tested are shown to be surface exposed byat least two of the techniques employed. These techniques, includingwhole cell ELISA and FACS analysis of whole Streptococcus pneumoniaecells, have different strengths for detection of surface exposedepitopes of proteins. Whole cell ELISA utilizes fixed cells bound to asolid phase support, while FACS analysis uses living Streptococcuspneumoniae in liquid suspension. However, the whole cell ELISA is moresensitive than the FACS analysis, and can thus give a more quantitativedetermination of surface exposed epitopes at low levels of antibodybinding. It is not known why the protein of ORF 2621 was so stronglypositive in the FACS analysis, yet had a comparatively low whole cellELISA titer (Table 18). This may be the result of differing growthconditions or the differing detection conditions employed in each of theassays. However, the data are consistent in that the proteins from 6ORFs that are noted to have surface exposed epitopes all are positive inboth assays employed.

The lack of detection of surface exposure in the 19F strain by FACS ispuzzling. None of the ORFs had detectable epitopes on the surface of the19F strain in the FACS technique used, but the majority of them werewell expressed in whole cell lysates from this strain (Table 18). Thismay be due to the unique capsular material of 19F covering the surfaceexposed proteins, or that the FACS technique is less sensitive againsttype 19F cells. It is also possible that none of the proteins testedhave surface exposed epitopes in type 19F, but this is extremelyunlikely, since even antiserum against another known candidate (PhpAprotein) (Zhang et al., 2001) that is surface exposed produced much lessdetectable surface antibody binding in FACS analysis as compared to type3 cells (data not shown).

The failure to detect surface reactive antibody in the type 14 wholecell ELISA (Table 18) was also most likely due to the growth of thecells or the assay conditions, because the standard sera employed gave amuch lower titer than normally observed.

The RT-PCR data serve to reinforce the potential of the candidateproteins from these ORF's. The data show that Streptococcus pneumoniaegrown either in vitro or in vivo produce mRNA specific for the ORFsexamined. Since it is known that the ORFs are expressed in vitro, it islikely that they are also expressed in vivo as well. Experiments are inprogress to confirm this using whole cell lysates from in vivo growncells.

Not every ORF analyzed could be shown to be expressed in Streptococcuspneumoniae. For example, a protein from ORF 935 was not detected byWestern blot analysis, whole cell ELISA (Table 18), or RT-PCR (data notshown). It may be that ORF 935 is only expressed under “real” in vivoconditions or that the sequencing of the region is incorrect and theexpressed protein is out of frame with the true protein produced byStreptococcus pneumoniae.

Example 3 Streptococcus pneumoniae Proteome Analysis Materials andMethods

Bacteria and Media.

S. pneumoniae type III (ATCC #6303) was obtained from the American TypeCulture Collection, Manassas, Va. S. pneumoniae type 19F was obtainedfrom Dr. Gerald Schiffman, State University of New York, Brooklyn, N.Y.A glycerol stock plate on Tryptic Soy Agar II (TSA II)/5.0% sheep bloodplate (Becton Dickinson Microbiology Systems, Cockeysville, Md.) wasprepared and incubated overnight, at 37° C. in the presence of 5.0% CO₂.Cells from each plate were transferred to 20 ml of Todd-HewittBroth/0.5% Yeast Extract (THY) and incubated overnight at 37° C. withgentle shaking (10 rpm) in the presence of 5.0% CO₂. For type 3, theculture was then diluted 10 fold with 100 ml of THY. For type 19F, theculture was then diluted 40 fold with 200 ml of THY. Both of thesediluted cultures were subsequently incubated under the above conditions.Type 19F required 9 h incubation time to reach a concentration of 1×10⁹cells/ml. Type 3 was incubated overnight and its concentration was notdetermined.

Isolation of Membrane Fraction.

The bacterial cultures were spun down and washed with PBS/MgSO₄ (30 mMsodium phosphate/150 mM NaCl/1 mM MgSO₄, pH 6.8). The pellets wereresuspended in 4 ml of PBS/MgSO₄ containing 5 μg Lysozyme (SigmaChemical Co., St. Louis, Mo.), and 400 μg Mutanolysin (Sigma). Thesamples were incubated at 37° C. for 1 hour with shaking. After theincubation, ˜300 units of RNAse Cocktail™ (Ambion Inc., Austin, Tex.)was added to each sample. The samples were centrifuged at low speedusing a tabletop centrifuge (2.5 k rpm, 10 min, at 4° C.). Thesupernatant was subsequently spun at high speed to pellet the membranefractions using a Beckman (Beckman Instruments, Inc., Palo Alto, Calif.)Model L8-70M Preparative Ultracentrifuge (60Ti rotor, at 40 k rpm, 4°C., 1 h). The supernatant was removed and the membrane pellet was washedwith PBS/MgSO₄.

Trypsin Digestion of Excised SDS-PAGE Gel Bands.

Mini SDS-PAGE gels (10 cm×10 cm) were run with precast 10-20% (w/v,acrylamide) gradient gels (Zaxis, Hudson, Ohio) at 200 V. The See Bluemolecular weight standard used was obtained from Invitrogen, Carlsbad,Calif. The gels were stained with Simply Blue Safestain, a colloidalCoomassie Blue G250 stain (Invitrogen) as per manufacturer'sinstructions. Each sample lane, in its entirety, was cut into 15different bands. For each sample, bands representing identical molecularweight areas of the gel from three sample lanes, run next to each other,were collected together for further processing. The gel slices werewashed twice with 0.5 ml of 50% (v/v) aqueous HPLC grade acetonitrile(Burdick & Jackson, Muskegon, Mich.) for 5 min with gentle shaking andstored frozen at −20° C. following removal of the wash liquid. Frozengel bands were thawed and cut into 1 mm cubes and subjected to in-geltrypsin digestion using a DigestPro robot (ABIMED Analysen-Technik GmbH,Langenfeld, Germany). In the configuration used, up to 30 samples couldbe processed simultaneously. The automated protocol consisted of thefollowing steps in order: reduction of the protein in the gel bands withdithiothreitol, alkylation with iodoacetamide, digestion with trypsinand elution of the peptides. Sequencing Grade Modified Trypsin obtainedfrom Promega Corporation, Madison, Wis. was used. This trypsin is highlyspecific for hydrolysis of peptide bonds at the carboxylic sides oflysine and arginine residues. It is modified by reductive methylation tomake it extremely resistant to autolysis, which can generatepseudotrypsin with chymotrypsin-like specificity. Specificity is furtherimproved by treatment with L-1-chloro-3-tosylamido-4-phenylbutan-2-one(TPCK) followed by affinity purification. The peptide digests werecollected, dried using a SpeedVac (Thermo Savant, Holbrook, N.Y.) to ˜10μl, and subsequently diluted to 50 μl with 0.1 M acetic acid. Sampleswere transferred to plastic autosampler vials, sealed, and injectedusing a 5 μl sample loop.

Microcapillary LC-Mass Spectrometry.

Mass spectral data were acquired on a Thermo Finnigan LCQ DECAquadrupole ion trap mass spectrometer (Thermo Finnigan, San Jose,Calif.) equipped with a microcapillary reversed-phaseHPLC/micro-electrospray interface. Peptide extracts were analyzed on anautomated microelectrospray reversed phase HPLC. The microelectrosprayinterface consisted of a Picofrit fused silica spray needle, 10 cmlength by 75 μm ID, 15 μm orifice diameter (New Objective, Cambridge,Mass.) packed with 10 μm C₁₈ reversed-phase beads (YMC, Wilmington,N.C.) to a length of 10 cm. The Picofrit needle was mounted in a fiberoptic holder (Melles Griot, Irvine, Calif.) held on a base positioned atthe front of the mass spectrometer detector. The rear of the column wasplumbed through a titanium union to supply an electrical connection forthe electrospray interface. The union was connected with a length offused silica capillary (FSC) tubing to a FAMOS autosampler (LC-Packings,San Francisco, Calif.) that was connected to an HPLC solvent pump (ABI140C, Perkin-Elmer, Norwalk, Conn.). The HPLC solvent pump delivered aflow of 50 μL/min. which was reduced to 250 nl/min. using a PEEKmicrotight splitting tee (Upchurch Scientific, Oak Harbor, Wash.), andthen delivered to the autosampler using an FSC transfer line. The HPLCpump and autosampler were each controlled using their internal userprograms.

Five microliters of the tryptic digest was separated using the C₁₈microcapillary HPLC column eluting directly into the orifice of the massspectrometer. Peptides were separated at a flow rate of 250 nl/min usinga 50 minute gradient of 4-65% (v/v) acetonitrile in 0.1 M acetic acid.Peptide analyses were conducted on the LCQ-DECA ion trap massspectrometer operating at a spray voltage of 1.5 kV, and using a heatedcapillary temperature of 140° C. Data were acquired in automated MS/MSmode using the data acquisition software provided with the instrument.As the peptides elute from the HPLC into the mass spectrometer, they aredetected and fragmented in a data dependent manner using “dynamicexclusion”. In this technique, the ion trap cycles between full scan andcollision induced dissociation (CID) mode, first detecting candidateions, and then collecting them for fragmentation. Decisions about whichions are going to be fragmented are performed by the instrument “on thefly”. The ions, once collected, are then added to an exclusion list andare rejected for a window of two minutes. This technique allows theinstrument to distribute its time efficiently when presented withanalytes of very high complexity. The operation can result in thecollection of as many as 1000 to 2000 fragmentation (CID) spectra in asingle run. The acquisition method included 1 MS scan (375-600 m/z)followed by MS/MS scans of the top 2 most abundant ions in the MS scan.The instrument then conducted a second MS scan (600-1000 m/z) followedby MS/MS scans of the top 2 most abundant ions in that scan. The dynamicexclusion and isotope exclusion functions were employed to increase thenumber of peptide ions that were analyzed (settings: 3 amu=exclusionwidth, 3 min=exclusion duration, 30 sec=pre-exclusion duration, 3amu=isotope exclusion width). For the current experiment involving 30samples, the data was collected in a completely automated fashion over48 hours using the autosampler.

Sequence Database Search for Identification of Proteins from CIDSpectra.

Automated analysis of MS/MS data was performed using the SEQUESTcomputer algorithm incorporated (Eng, McCormack and Yates, 1994) intothe Finnigan Bioworks data analysis package (ThermoFinnigan, San Jose,Calif.) using the protein sequence databases described below. SEQUEST ishighly computation intensive, the searches for this study were performedon a dedicated 12×600 MHz PC cluster. Peptide matches with Xcorr valuesgreater than 2.0 were loaded into a database for further computationalanalysis followed by manual verification of the data where necessary (asdescribed below).

Results and Discussion

Proteomics Based Approach

The term ‘proteome’ has been defined as the proteins expressed by thegenome of an organism or tissue. One of the primary goals of analysis ofthe proteome or proteomics involves identification of proteins in alarge-scale high-throughput format.

Bacterial membrane preparations constitute a very important source forsurface localized proteins, which are likely candidate antigens. Aproteomics based approach was taken to identify the protein componentsof the complex mixture of proteins contained in the membrane fraction ofStreptococcus pneumoniae. The study of membrane associated proteinsoffers a very specific and significant challenge for proteomics. Thedetergents required to keep these proteins in aqueous solution usuallyinterfere with analytical methods. During two-dimensional (2-D) gelelectrophoresis, which has been widely used for the analysis of solubleproteins, severe quantitative loss of membrane proteins is oftenobserved. The problem is more severe when immobilized pH gradients areused in the first dimension. To minimize such solubility problems withmembrane preparations from some other bacteria, several samplepreparations, as well as some novel zwitterionic detergents were tested;all of which were shown to improve the analysis of membrane proteins by2-D gel electrophoresis. However, applicants believe their success inidentifying the major set of outer membrane proteins was quite limited.In view of this, a novel combination of a very simple and a very complexmethod for identification of the membrane proteome component ofStreptococcus pneumoniae has been applied, as described below.

In this approach, the membrane preparation was first separated by sodiumdodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) using amini gel format, followed by staining of the gel with a colloidalCoomassie blue stain. Fifteen gel bands containing the entire samplelane were excised and the bands digested individually with trypsin. Thetryptic peptides were analyzed using microcapillary reversed-phaseliquid chromatography-micro-electrospray tandem mass spectrometry(LC-MS/MS) on a Finnigan LCQ Deca quadrupole ion trap mass spectrometer.Tandem mass spectrometry (MS/MS) has been shown to be a powerfulapproach to analyze proteins (Eng, McCormack and Yates, 1994). In thefirst step, MS/MS uses a mass analyzer to separate a peptide ion from amixture of ions, then uses a second step or mass analyzer to activateand dissociate the ion of interest. This process, known ascollision-induced dissociation (CID), causes the peptide to fragment atthe peptide bonds between the amino acids, and the fragmentation patternof a peptide is used to determine its amino acid sequence. The SEQUESTcomputer algorithm (Eng, McCormack and Yates, 1994) was used to searchthe uninterpreted experimental fragmentation spectra against protein ortranslated nucleotide sequence databases to identify the proteinspresent in each gel band. SEQUEST conceptually digests protein sequencesin a database into tryptic peptides and then models them into simulatedCID spectra using the known rules of peptide fragmentation. SEQUEST thencompares these simulated CID spectra against the experimental spectraand returns a list of probable peptide sequences matching the raw dataalong with different parameters representing the fidelity of the match.For peptides above roughly 800-900 Dalton in size, a single spectrum canuniquely identify a protein.

To obtain sequence information on multiple peptides from the complexmixture generated by trypsin digestion of the SDS-PAGE gel bands, areversed phase chromatography system was coupled to an electrospray iontrap mass spectrometer. In this system, it is known that highsensitivity (down to sub-femtomole levels) can be attained by minimizingboth flow rate and column diameter to concentrate the elution volume anddirect as much of the column effluent as possible into the orifice ofthe mass spectrometer. To maximize the coverage of proteins present inthe sample, the data-dependent acquisition feature of the ion trap wasemployed. Dynamic exclusion was used to prevent reacquisition of tandemmass spectra of ions once a spectrum had been acquired for a particularm/z value. Use of these data-dependent features dramatically increasedthe number of peptide ions that were selected for CID analysis.

The LC-MS/MS data acquisition conditions described above typicallyresulted in fragmentation data for more than 2000 peptide ions for eachrun. Using the SEQUEST algorithm, this data was correlated against twoprotein sequence databases. The first one, 5 nA6F6, contained openreading frames obtained from translation of Streptococcus pneumoniaetype 4 genome sequence (TIGR4) in all six reading frames with thesmallest peptide containing six amino acid residues. The second one, nr,is a non-redundant GenBank protein sequence database. SEQUEST searchconditions used trypsin selectivity for both of the searches. The 5nA6F6 search allowed a differential search of +16 Dalton for methionineresidues to account for peptides displaying oxidation of methionine.

Candidate matches identified by SEQUEST were confirmed using thefollowing procedure. For each peptide, SEQUEST computes a Xcorr valuefrom cross correlation of the experimental MS/MS spectrum with thecandidate peptides in the sequence database. The Xcorr is a measure ofthe similarity of the experimental MS/MS data to that generated from thesequence database. Peptide matches with Xcorr values greater than 2.0were selected for further analysis and loaded on to an in-housedeveloped system for analysis of SEQUEST data using the commerciallyavailable Oracle® relational database system. Since the SEQUEST outputis quite complex, applicants incorporated a new scoring algorithm inOracle® to calculate a match score for each protein identified asfollows:

Protein Score=n3(Xcorr/rank)

where the rank is that assigned by SEQUEST for each peptide sequenceidentified from a specific protein sequence in the database and n is thenumber of unique peptides identified for that protein, since the samepeptide may be identified multiple times in an LC-MS/MS experiment. Thefragmentation spectra for all moderate or weak assignments by thesoftware used were checked manually by direct examination of the CIDspectra for reasonable signal/noise ratio, and the list of matched ionswas also examined for reasonable continuity. Generally three or morespectra converging with reasonable Protein Score (usually >25) or Xcorrvalues (usually >2.5) onto a single database entry constitutes aconvincing identification.

The rationale behind the experimental proteomics approach forcharacterization of membrane associated proteins of Streptococcuspneumoniae was that the single SDS-PAGE step circumvented the solubilitycomplications associated with isoelectric focusing in 2-D gelelectrophoresis. It also offered a simple fractionation of the membranepreparation according to molecular weight that reduced the complexity ofthe samples subjected to LC-MS/MS analysis. The combination of theseanalytical techniques allowed us to separate and obtain sequenceinformation of multiple peptides with high sensitivity over a largeconcentration range and identify the corresponding proteins bycorrelation with sequences in databases. As part of this study, a methodfor the isolation of membrane preparations from Streptococcus pneumoniaewas also developed. This involved enzymatic digestion of Streptococcuspneumoniae cell walls with mutanolysin and lysozyme in a hypotonicbuffer followed by differential centrifugation. The twenty-eight ORFsrepresenting surface exposed proteins were also identified by theproteomic approach and are presented in Table 11. The ORFs representingmembrane associated proteins and identified by the proteomic approachare presented in Table 12. Table 14 contains all the open reading framesidentified from the 5 nA6F6 database representing the TIGR4 genomicsequence. Table 14 also contains proteins identified from the nrdatabase search which do not originate from the TIGR4 genome.

Combination of Genomics and Proteomics Approaches

The ORFs identified by proteomics represent surface localized, surfaceexposed or membrane associated proteins of Streptococcus pneumoniae.Those twenty-eight ORFs that support the putative surface exposed ORFsidentified by genomics approaches (i.e., Tables 1-10) are listed inTable 11 and provide further evidence of surface localization of thesecandidates. The 161 novel ORFs identified by proteomics as membraneassociated are listed in Table 12.

Example 4 Immunogold Labeling of Streptococcus pneumoniae and LowVoltage Scanning Electron Microscopy

Surface exposure of proteins on Streptococcus pneumoniae may also beassessed by immunogold labeling of whole bacteria and electronmicroscopy. Bacteria cells are labeled as previously described (Olmstedet al., 1993). Briefly, late-log phase bacterial cultures are washedtwice, and resuspended to a concentration of 1×10⁸ cells/ml in 10 mMphosphate buffered saline (PBS) (pH 7.4) and placed on poly-L-lysinecoated glass coverslips. Excess bacteria are gently washed from thecoverslips and unlabeled samples are placed into fixative (2.0%glutaraldehyde, in a 0.1 M sodium cacodylate buffer containing 7.5%sucrose) for 30 min. Bacteria to be labeled with colloidal gold arewashed with PBS containing 0.5% bovine serum albumin, and the pre-immuneor hyper-immune mouse polyclonal antibody prepared above applied for 1hour at room temperature. Bacteria are then gently washed, and a 1:6dilution of goat anti-mouse conjugated to 18 nm colloidal gold particles(Jackson ImmunoResearch Laboratories, Inc., West Grove, Pa.) applied for10 min at room temperature. Finally, all samples are washed gently withPBS, and placed into the fixative described above. The fixative iswashed from samples twice for 10 min in 0.1 M sodium cacodylate buffer,and postfixed for 30 min in 0.1 M sodium cacodylate containing 1% osmiumtetroxide. The samples are then washed twice with 0.1 M sodiumcacodylate, dehydrated with successive concentrations of ethanol,critical point dried by the CO₂ method of Anderson (Anderson, 1951)using a Samdri-780A (Tousimis, Rockville, Md.), and coated with a 1-2 nmdiscontinuous layer of platinum. Streptococcus pneumoniae cells areviewed with a LEO 1550 field emission scanning electron microscopeoperated at low accelerating voltages (1-4.5 keV) using a secondaryelectron detector for conventional topographical imaging and ahigh-resolution Robinson backscatter detector to enhance thevisualization of colloidal gold by atomic number contrast.

Example 5 In Vitro Opsonphagocytosis Analysis

An in vitro opsonic reaction, that may mimic the in vivo reaction, isconducted by incubating together a mixture of Streptococcus pneumoniaecells, heat inactivated human serum containing specific antibodies tothe pneumococcal strain, and an exogenous complement source.Opsonophagocytosis proceeds during incubation of freshly isolated humanpolymorphonuclear cells (PMN's) and the antibody/complement/pneumococcalcell mixture. Bacterial cells that are coated with antibody andcomplement are killed upon opsonophagocytosis. Colony forming units(cfu) of surviving bacteria that escape from opsonophagocytosis aredetermined by plating the assay mixture. Titers are reported as thereciprocal of the highest dilution that gives ≧50% bacterial killing, asdetermined by comparison to assay controls. Specimens which demonstrateless than 50% killing at the lowest serum dilution tested (1:8), arereported as having an OPA titer of 4. The highest dilution tested is1:2560. Samples with 50% killing at the highest dilution are repeated,beginning with a higher initial dilution.

The present method is a modification of Gray's method (Gray, B. M.1990). The assay mixture is assembled in a 96-well microtiter tissueculture plate at room temperature. The assay mixture consists of 10 μLof test serum (a series of two-fold dilutions) heated to 56° C. for 30minutes prior to testing, 10 μL of preclostral bovine serum (complementsource) having no opsonic activity for the bacterial test strain, and 20μL of buffer containing 2000 viable Streptococcus pneumoniae organisms.This mixture is incubated at 37° C. without CO₂ for 30 minutes withshaking. Next, 40 μL of human PMNs, freshly prepared from heparinizedperipheral blood by dextran sedimentation and Percoll densitycentrifugation, suspended in buffer at a concentration of 1×10⁶/mL isadded. The assay plate(s) are then incubated at 37° C. for an additional90 minutes with vigorous shaking. Aliquots from each well are dispensedonto the upper ¼ of a 15×100 mm blood agar plate. The blood agar plateis tilted while pipetting to allow the liquid suspension to “run” downthe plate. Plates are incubated overnight in 5% CO₂ at 37° C. The viablecfu are counted the following morning. Negative control wells, lackingbacterial cells, test serum, complement and/or phagocytes in appropriatecombination are included in each assay. A test serum control, whichcontains test serum plus bacterial cells and heat inactivatedcomplement, is included for each individual serum. This control can beused to assess whether the presence of antibiotics or other serumcomponents are capable of killing the bacterial strain directly (i.e. inthe absence of complement or PMN's). A human serum with known opsonictiter is used as a positive human serum control. The opsonic antibodytiter for each unknown serum is calculated as the reciprocal of theinitial dilution of serum giving 50% cfu reduction compared to thecontrol without serum.

Example 6 Intranasal or Parenteral Immunization of CBA/CaHN Mice Priorto Challenge

Six-week old, pathogen-free, male CBA/CaHN xid/J (CBA/N) mice arepurchased from Jackson Laboratories (Bar Harbor, Me.) and housed incages under standard temperature, humidity, and lighting conditions.CBA/N mice, at 10 animals per group, are immunized with an appropriateamount of the protein(s) to be tested. For parenteral immunization, theprotein is mixed with 100 μg of MPL™ per dose to a final volume of 200μl in saline and then injected subcutaneously (SC) into mice. All groupsreceive a booster with the same dose and by the same route 3 and 5 weeksafter the primary immunization. Control mice are injected with MPL™alone. All mice are bled two weeks after the last boosting; sera is thenisolated and stored at −20° C. For intranasal (IN) immunization, micereceive three IN immunizations, one week apart. On each occasion, anappropriate dose of the protein to be tested is formulated with 0.1 μgof CT-E29H, a genetically modified cholera toxin that is reduced inenzymatic activity and toxicity (Tebbey et al., 2000), and slowlyinstilled into the nostril of each mouse in a 10 μl volume. Miceimmunized with CT-E29H alone are used as controls. Serum samples arecollected one week after the last immunization.

Example 7 LD₅₀ Determination

Six or 12-week old CBA/N mice (10 per group) are challenged intranasally(IN) with 10 μl of a suspension of streptomycin resistant type 3Streptococcus pneumoniae diluted to 5×10⁹ CFU/ml in PBS. Two-fold serialdilutions of this suspension are also tested. The actual doses ofbacteria administered are determined by plating dilutions of theinoculum on streptomycin containing tryptic soy agar plates. The LD₅₀ iscalculated by the Reed-Muench method as discussed by Lennette (Lennette,1995). The LD₅₀ of 13-week old CBA/N mice with type 3 strain waspreviously shown to be 1×10⁵ CFU, while the LD₅₀ of 6-week old CBA/Nmice was 1×10⁴ CFU.

Example 8 CBA/CaHN xid Mouse Intranasal Challenge Model

Mice are challenged with either serotype 3 or serotype 14 streptomycinresistant Streptococcus pneumoniae. Pneumococci are inoculated into 3 mlof Todd-Hewitt broth containing 100 μg/ml of streptomycin. The cultureis grown at 37° C. until mid-log phase, then diluted to the desiredconcentration with Todd-Hewitt broth and stored on ice until use. Eachmouse is anesthetized with 1.2 mg of ketamine HCl (Fort DodgeLaboratory, Ft. Dodge, Iowa) by intraperitoneal (IP) injection. Thebacterial suspension is inoculated to the nostril of anesthetized mice(10 μl per mouse). The actual dose of bacteria administered is confirmedby plate count. Two or 3 days after challenge, mice are sacrificed, thenoses are removed, and homogenized in 3-ml sterile saline with a tissuehomogenizer (Ultra-Turax T25, Janke & Kunkel Ika-Labortechnik, Staufen,Germany). The homogenate is 10-fold serially diluted in saline andplated on streptomycin containing TSA plates. Fifty μl of bloodcollected 2 days post-challenge from each mouse are also plated on thesame kind of plates. Plates are incubated overnight at 37° C. and thencolonies are counted. CBA/N mice are observed daily after challenge, andthe mortality is monitored for 14 days.

Example 9 Intranasal Immunization of Balb/c Mice Prior to Challenge

Six-week old, pathogen-free, Balb/c mice are purchased from JacksonLaboratories (Bar Harbor, Me.) and housed in cages under standardtemperature, humidity, and lighting conditions. BALB/C mice, at 10animals per group, are immunized with an appropriate amount of theprotein to be tested on weeks 0, 2, and 4. On each occasion, the proteinbeing tested is formulated with 0.1 μg of CT-E29H, and slowly instilledinto the nostril of each mouse in a 10 μl volume. Mice immunized withKeyhole Limpet Hemocyanin (KLH)—CT-E29H are used as controls. Serumsamples are collected 4 days after the last immunization.

Example 10 Mouse Intranasal Challenge Model

Balb/c mice are challenged on the sixth day of week 4 (i.e., atapproximately 27 days) with 1×10⁵ CFU's of serotype 3 streptomycinresistant Streptococcus pneumoniae. Pneumococci are inoculated into 3 mlof Todd-Hewitt broth containing 100 μg/ml of streptomycin. The cultureis grown at 37° C. until mid-log phase, then diluted to the desiredconcentration with Todd-Hewitt broth and stored on ice until use. Eachmouse is anesthetized with 1.2 mg of ketamine HCl (Fort DodgeLaboratory, Ft. Dodge, Iowa) by i.p. injection. The bacterial suspensionis inoculated into the nostril of anesthetized mice (10 μl per mouse).The actual dose of bacteria administered is confirmed by plate count.Four days after challenge, mice are sacrificed, the noses removed, andhomogenized in 3-ml sterile saline with a tissue homogenizer(Ultra-Turax T25, Janke & Kunkel Ika-Labortechnik, Staufen, Germany).The homogenate is 10-fold serially diluted in saline and plated onstreptomycin containing TSA plates. Fifty μl of blood collected 2 dayspost-challenge from each mouse also is plated on the same kind ofplates. Plates are incubated overnight at 37° C. and then colonies arecounted.

REFERENCES

-   International Application No. EP A02323621-   International Application No. EP 0036776-   International Application No. EP 0859055-   International Application No. EP 125,023-   International Application No. EP 171,496-   International Application No. EP 171,496-   International Application No. EP 184,187-   International Application No. EP 264166-   International Application No. PCT/US86/02269-   U.S. Pat. No. 4,196,265-   U.S. Pat. No. 4,522,811-   U.S. Pat. No. 4,554,101-   U.S. Pat. No. 4,683,195-   U.S. Pat. No. 4,683,202-   U.S. Pat. No. 4,736,866-   U.S. Pat. No. 4,816,567-   U.S. Pat. No. 4,870,009-   U.S. Pat. No. 4,873,191-   U.S. Pat. No. 4,873,316-   U.S. Pat. No. 4,987,071-   U.S. Pat. No. 5,116,742-   U.S. Pat. No. 5,223,409-   U.S. Pat. No. 5,272,057-   U.S. Pat. No. 5,283,317-   U.S. Pat. No. 5,328,470-   U.S. Pat. No. 5,498,531-   U.S. Pat. No. 5,766,844-   U.S. Pat. No. 5,789,654-   U.S. Pat. No. 5,798,209-   U.S. Pat. No. 6,201,103-   U.S. SIR No. H1,892-   International Application No. WO 86/01533-   International Application No. WO 90/02809-   International Application No. WO 90/11354-   International Application No. WO 91/01140-   International Application No. WO 91/17271-   International Application No. WO 92/01047-   International Application No. WO 92/0968-   International Application No. WO 92/09690-   International Application No. WO 92/15679-   International Application No. WO 92/18619-   International Application No. WO 92/20791-   International Application No. WO 93/01288-   International Application No. WO 93/04169-   International Application No. WO94/10300-   International Application No. WO 94/16101-   International Application No. WO 97/07668-   International Application No. WO 97/07669-   International Application No. WO 00/63364-   Abravaya et al., Nucleic Acids Res., 23:675-682, 1995.-   Adams et al., Nature 355:632-634, 1992.-   Adams et al., Nature 377 Supp:3-174, 1995.-   Adams et al., Science 252:1651-1656, 1991.-   Altschul et al, “Gapped BLAST and PSI-BLAST: a new generation of    protein database search programs,” Nuc. Acids Res. 25(17):3389-402,    1997.-   Altschul et al., J. Molec. Biol. 215:403-410, 1990.-   Amann et al., Gene 69:301-315, 1988.-   Anderson, “Techniques for the preservation of three-dimensional    structure in preparing specimens for the electron microscope.”    Trans. N.Y. Acad. Sci. 13(130):130-134, 1951.-   Bairoch and Apweiler, Nucleic Acids Research, 28:45-48, 2000.-   Baldari et al., Embo J. 6:229-234, 1987.-   Banerji et al., Cell, 33:729-740; 1983.-   Barker et al., Nucleic Acids Research, 29:29-32, 2001.-   Bartel and Szostak, Science 261:1411-1418, 1993.-   Bartel et al. Biotechniques 14:920-924, 1993(b).-   Bartel, “Cellular Interactions and Development: A Practical    Approach”, pp. 153-179, 1993(a).-   Bateman et al., “The Pfam protein families database,” Nucleic Acid    Res., 28(1), 263-266, 2000.-   Benson, “Tandem repeats finder: a program to analyze DNA sequences,”    Nucleic Acids Res. 27(2):573-80, 1999.-   Bradley, Current Opinion in Biotechnology 2:823-829, 1991.-   Bradley, in “Teratocarcinomas and Embryonic Stem Cells: A Practical    Approach,” E. J. Robertson, ed., IRL, Oxford, pp. 113-152, 1987.-   Briles et al., “Intranasal immunization of mice with a mixture of    the pneumococcal proteins PsaA and PspA is highly protective against    nasopharyngeal carriage of Streptococcus pneumoniae,” Infect. Immun.    68(2):796-800, 2000.-   Bunzow et al., Nature, 336:783-787, 1988.-   Burge and Karlin, “Prediction of complete gene structures in human    genomic DNA.” J. Mol. Biol. 268:78-94, 1997.-   Butler et al., “Pneumococcal vaccines: history, current status, and    future directions,” Am. J. Med. 107(1A):69S-76S, 1999.-   Byrne and Ruddle, PNAS 86:5473-5477, 1989.-   Calame and Eaton, Adv. Immunol. 43:235-275, 1988.-   Campes and Tilghman, Genes Dev. 3:537-546, 1989.-   Chen et al., PNAS 91:3054-3057, 1994.-   Cohen et al., Adv. Chromatogr. 36:127-162, 1996.-   Cotton et al., PNAS 85:4397, 1988.-   Cotton, Mutat. Res. 285:125-144, 1993.-   Cowan et al., “RGS Proteins: Lessons from the RGS9 subfamily,”    Progress in Nucleic Acid Research and Molecular Biology 65:341-359,    2001.-   Crain et al., “Streptococcus pneumoniaecoccal surface protein A    (PspA) is serologically highly variable and is expressed by all    clinically important capsular serotypes of Streptococcus    pneumoniae,” Infect. Immun. 58(10):3293-9, 1990.-   Cserzo et al., “Prediction of transmembrane alpha-helices in    prokaryotic membrane proteins: the dense alignment surface method,”    Protein Engineering 10(6):673-6, 1997.-   D'Eustachio et al., Science 220:919-924, 1983.-   Devereux et al., Nucleic Acids Research 12(1):387, 1984.-   Dintilhac, et al., “Competence and virulence of Streptococcus    pneumoniae: Adc and PsaA mutants exhibit a requirement for Zn and Mn    resulting from inactivation of putative ABC metal permeases,” Mol.    Microbiol. 25(4):727-739, 1997.-   Doestschman et al., J. Embryol. Exp. Morphol. 87:27-45, 1985.-   Douglas et al., “Antibody response to pneumococcal vaccination in    children younger than five years of age,” J. Infect. Dis.    148:131-137, 1983.-   Eddy, “Hidden Markov models” Current Opinion in Structural Biology    6(3):361-5, 1996.-   Edlund et al., Science 230:912-916, 1985.-   Eichelbaum, Clin. Exp. Pharmacol Physiol, 23(10-11):983-985, 1996.-   Elledge et al., Proc. Natl. Acad. Sci. USA, 88:1731-1735, 1991.-   Eng, McCormack and Yates, “An approach to correlate tandem    mass-spectral data of peptides with amino-acid-sequences in a    protein database,” Journal of the American Society for Mass    Spectrometry,” 5:976-989, 1994.-   Fan, Y. et al., PNAS, 87:6223-27, 1990.-   Finely et al., Proc. Natl. Acad. Sci. USA, 91:12980-12984, 1994.-   Foster and Hook, “Surface protein adhesins of Staphylococcus    aureus,” Trends Microbiol. 6(12):484-8, 1998.-   Fraser et al., “Genomic sequence of a Lyme disease spirochaete,    Borrelia burgdorferi” Nature 390(6660):580-6, 1997.-   Frohman et al., Proc. Natl. Acad. Sci. USA 85, 8998-9002, 1988.-   Gaultier et al., Nucleic Acids Res. 15:6625-6641, 1987.-   Gentz et al., Proc. Natl. Acad. Sci. USA, 86:821-824, 1989.-   Goldstein and Garau, “30 years of penicillin-resistant S pneumoniae:    myth or reality?,” Lancet 350(9073):233-4.-   Gray, Conjugate Vaccines Supplement p694-697, 1990.-   Griffin et al., Appl. Biochem. Biotechnol. 38:147-159, 1993.-   Gunnar von Heijne, “Membrane Protein Structure Prediction,    Hydrophobicity Analysis and the Positive-inside Rule” J. Mol. Biol.,    225:487-494, 1992.-   Harlow and Lane, “Antibodies: A Laboratory Manual,” Cold Spring    Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1988-   Harper et al., Cell, 75:805-816, 1993.-   Haselhoff and Gerlach, Nature 334:585-591, 1988.-   Hausdorff et al., “Which pneumococcal serogroups cause the most    invasive disease: implications for conjugate vaccine formulation and    use, part I,” Clinical Infectious Diseases 30(1):100-21, 1997.-   Hayashi, Genet. Anal. Tech. Appl. 9:73-79, 1992.-   Helene et al., Ann. N.Y. Acad. Sci. 660:27-36, 1992.-   Helene, Anticancer Drug Des. 6(6):569-84, 1991.-   Hepler, “Emerging roles for RGS proteins in cell signalling,” Trends    in Phamacological Sciences 20:376-382, 1999.-   Hernandez-Sanchez et al., “lambda bar minigene-mediated inhibition    of protein synthesis involves accumulation of peptidyl-tRNA and    starvation for tRNA,” EMBO Jour. 17(13):3758-65, 1998.-   Hogan, “Manipulating the Mouse Embryo,” Cold Spring Harbor    Laboratory Press, Cold Spring Harbor, N.Y., 1986.-   Inoue et al., FEBS Lett. 215:327-330, 1987(a).-   Inoue et al., Nucleic Acids Res. 15:6131-6148, 1987(b).-   Isberg and Tran Van Nhieu, “Binding and internalization of    microorganisms by integrin receptors,” Trends in Microbiol.    2(1):10-4, 1994.-   Iwabuchi et al., Oncogene 8:1693-1696, 1993.-   Johnson et al., Endoc. Rev., 10:317-331, 1989.-   Kaufman et al., EMBO J. 6:187-195, 1987.-   Kessel and Gruss, Science 249:3 74-3 79, 1990.-   Klein et al., Curr. Genet., 16:145-152, 1989(b).-   Klein et al., Curr. Genet. 13:29-35, 1989(a).-   Koebnik, “Proposal for a peptidoglycan-associating alpha-helical    motif in the C-terminal regions of some bacterial cell-surface    proteins,” Mol. Microbiol. 16(6):1269-70, 1995.-   Krappa et al., “Evectins: Vesicular proteins that carry a pleckstrin    homology domain and localize to post-Golgi membranes,” Proceedings    of the National Academy of Sciences 96:4633-4368, 1999.-   Kurj an and Herskowitz, Cell 933-943, 1982.-   Kyte and Doolittle, J. Mol. Biol., 157:105-132, 1982.-   Lakso et al., PJVAS 89:6232-6236, 1992.-   Laemmli, “Cleavage of structural proteins during the assembly of the    head of bacteriophage T4,” Nature (London) 227:680-685, 1970.-   Lefkowitz, Nature, 351:353-354, 1991.-   Lennette, “General principles for laboratory diagnosis of viral,    rickettsial, and chlamydial infections,” p. 17-18, diagnostic    procedures for viral, rickettsial, and chlamydial infections, vol.    7th edition, 1995.-   Lewis, “Programmed death in bacteria,” Microbiol. Mol. Biol. Rev.    64(3):503-14, 2000.-   Li et al., Cell 69:915, 1992.-   Linder, Clin. Chem. 43(2):254-266, 1997.-   Loessner et al., “Evidence for a holin-like protein gene fully    embedded out of frame in the endolysin gene of Staphylococcus aureus    bacteriophage 187,” J. Bacteriol. 181(15):4452-60, 1999.-   Lowry et al., “Protein measurement with the Folin-Phenol    reagents,” J. Biol. Chem. 193:265-275, 1951.-   Lucklow and Summers, Virology 170:31-39, 1989.-   Lukashin and Borodovsky, “GeneMark.hmm: new solutions for gene    finding,” Nuc. Acids Res. 26(4):1107-15, 1998.-   Madura et al., J. Biol. Chem. 268:12046-1205, 1993-   Maher, Bioassays 14(12):807-15, 1992.-   Mansour et al., Nature 336:348, 1988-   Maxim and Gilbert, PNAS 74:560, 1977.-   Mazmanian et al., “Staphylococcus aureus sortase, an enzyme that    anchors surface proteins to the cell wall,” Science 285(5428):760-3,    1999.-   McAtee et al., “Characterization of a Helicobacter pylori vaccine    candidate by proteome techniques,” J. Chromatogr. B. Biomed. Sci.    Appl. 714(2):325-33, 1998(c).-   McAtee et al., “Identification of potential diagnostic and vaccine    candidates of Helicobacter pylori by “proteome” technologies,”    Helicobacter 3(3):163-9, 1998(a).-   McAtee et al., “Identification of potential diagnostic and vaccine    candidates of Helicobacter pylori by two-dimensional gel    electrophoresis, sequence analysis, and serum profiling,” Clin.    Diagn. Lab. Immunol 5(4):537-42, 1998(b).-   McDaniel et al., “Monoclonal antibodies against protease-sensitive    pneumococcal antigens can protect mice from fatal infection with    Streptococcus pneumoniae,” J. Exp. Med. 160(2):386-97, 1984.-   Mejlhede et al., “Ribosomal-1 frameshifting during decoding of    Bacillus subtilis cdd occurs at the sequence CGA AAG,” J. Bacteriol.    181(9):2930-7, 1999.-   Morrison et al., “Isolation and characterization of three new    classes of transformation deficient mutants of Streptococcus    pneumoniae that are defective in DNA transport and genetic    recombination,” Journal of Bacteriology, 156:281-290, 1983.-   Morin et al., Nucleic Acids Res., 21:2157-2163, 1993.-   Myers et al., Nature 313:495, 1985(a).-   Myers et al., Science 230:1242, 1985(b).-   Nabors et al., “Immunization of healthy adults with a single    recombinant pneumococcal surface protein A (PspA) variant stimulates    broadly cross-reactive antibodies to heterologous PspA molecules,”    Vaccine 18:1743-1754, 2000.-   Nakai and Kanehisa, “Expert system for predicting protein    localization sites in gram-negative bacteria,” Proteins    11(2):95-110, 1991.-   Navarre and Schneewind, “Surface Proteins of Gram-Positive Bacteria    and Mechanisms of Their Targeting to the Cell Wall Envelope,”    Microbiol. Mol. Biol. Rev. 63(1):174-229, 1999.-   Nielsen et al., “Identification of prokaryotic and eukaryotic signal    peptides and prediction of their cleavage sites,” Protein    Engineering 10(1):1-6, 1997.-   O'Gon-nan et al., Science 251:1351-1355, 1991.-   Olmsted et al., “High-resolution visualization by field emission    scanning electron microscopy of Enterococcus faecalis surface    proteins encoded by the pheromone-inducible conjugative plasmid    pCF10,” J. Bacteriol. 175(19):6229-37, 1993.-   Orita et al., PNAS 86:2766, 1989.-   Orihuela et al., “Peritoneal culture alters Streptococcus pneumoniae    protein profiles and virulence properties,” Infect. Immun.    68:6082-6086, 2000.-   Park and Teichmann, “DIVCLUS: an automatic method in the GEANFAMMER    package that finds homologous domains in single- and multi-domain    proteins,” Bioinformatics 14(2):144-50, 1998.-   Parkhill et al., “Complete DNA sequence of a serogroup A strain of    Neisseria meningitidis Z2491,” Nature 404(6777):502-6, 2000.-   Pierschbacher and Ruoslahti, “Influence of stereochemistry of the    sequence Arg-Gly-Asp-Xaa on binding specificity in cell    adhesion,” J. Biol. Chem. 262(36):17294-8, 1987.-   Pinkert et al. Genes Dev. 1:268-277, 1987.-   Pizza et al., “Identification of vaccine candidates against    serogroup B meningococcus by whole-genome sequencing,” Science    287(5459):1816-20, 2000.-   Pugsley, “The complete general secretory pathway in gram-negative    bacteria,” Microbiol. Rev. 57(1):50-108, 1993.-   Queen and Baltimore, Cell 33:741-748, 1983.-   Rahman et al., Journal of Neuroscience 19:2016-2026, 1999.-   Rose et al., “Methods in Yeast Genetics: A Laboratory Course    Manual.” Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1990).-   Rosenow et al., “Contribution of novel choline-binding proteins to    adherence, colonization and immunogenicity of Streptococcus    pneumoniae,” Mol. Microbiol. 25(5):819-29, 1997.-   Ross and Wilkie, “GTPase-activating proteins for Heterotrimeric G    proteins: Regulators of G protein Signaling (RGS) and RGS-like    proteins,” Annual Reiew of Biochemistry 69:795-827, 2000.-   Saleeba et al., Meth. Enzymol. 217:286-295, 1992.-   Salzberg et al., “Microbial gene identification using interpolated    Markov models,” Nuc. Acids Res. 26(2):544-8, 1998.-   Sambrook et al., “Molecular Cloning: A Laboratory Manual” 2nd, ed,    Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press,    Cold Spring Harbor, N.Y., 1989.-   Sampson et al., “Cloning and nucleotide sequence analysis of psaA,    the Streptococcus pneumoniae gene encoding a 37-kilodalton protein    homologous to previously reported Streptococcus sp. Adhesins,”    Infect. Immun. 62(1):319-24, 1994.-   Sanger, PNAS 74:5463, 1977.-   Schultz et al., Gene 54:113-123, 1987.-   Seed, Nature 329:840, 1987.-   Shinefield and Black, “Efficacy of pneumococcal conjugate vaccines    in large scale field trials (In Process Citation),” Pediatr. Infect.    Dis. J. 19(4):394-7, 2000.-   Simon et al., Science, 252:802-8, 1991.-   Smith and Johnson, Gene 67:31-40, 1988.-   Smith et al., Mol. Cell. Biol. 3:2156-2165, 1983.-   Songyang, et al., Cell 72:767-778, 1993.-   Sonnenberg and Belisle, “Definition of Mycobacterium tuberculosis    culture filtrate proteins by two-dimensional polyacrylamide gel    electrophoresis, N-terminal amino acid sequencing, and electrospray    mass spectrometry,” Infect. Immun. 65(11):4515-24, 1997.-   Sonnhammer et al., “A hidden Markov model for predicting    transmembrane helices in protein sequences,” Ismb 6:175-82, 1998.-   Stockbauer et al., “A natural variant of the cysteine protease    virulence factor of group A streptococcus with an    arginine-glycine-aspartic acid (RGD) motif preferentially binds    human integrins alphavbeta3 and alphaIIbbeta3 (In Process    Citation),”Proc. Natl. Acad. Sci. USA 96(1):242-7, 1999.-   Studier et al. “Gene Expression Technology” Methods in Enzymology    185, 60-89, 1990.-   Talkington et al., “Protection of mice against fatal pneumococcal    challenge by immunization with pneumococcal surface adhesin A    (PsaA),” Microb. Pathog. 21(1):17-22, 1996.-   Tebbey et al., “Effective mucosal immunization against respiratory    syncytial virus using a genetically detoxified cholera holotoxin,    CT-E29H,” Vaccine 18(24):2723-34, 2000.-   Thomas and Capecchi, Cell 51:503, 1987.-   Weldingh et al., “Two-dimensional electrophoresis for analysis of    Mycobacterium tuberculosis culture filtrate and purification and    characterization of six novel proteins,” Infect. Immun.    66(8):3492-500, 1998.-   Wilmut et al., Nature 385:810-813, 1997.-   Wilson et al., Cell 37:767, 1984.-   Winoto and Baltimore. EMBO J. 8:729-733, 1989.-   Xu et al., “PHR1 encodes an abundant, pleckstrin homology    domain-containing Integral membrane protein in the photoreceptor    outer segments,” Journal of Biological Chemistry 274:35676-35685,    1999.-   Yamamoto et al., “A nontoxic adjuvant for mucosal immunity to    pneumococcal surface protein,” A. J. Immunol. 161(8):4115-21, 1998.-   Zervos et al., Cell 72:223-232, 1993.-   Zhang et al., 2001, “Recombinant PhpA Protein, a Unique Histidine    Motif-Containing Protein from Streptococcus pneumoniae, Protects    Mice against Intranasal Pneumococcal Challenge,” Infect. Immun.    69:3827-3836, 2001.

1. An isolated polypeptide comprising an amino acid sequence at least95% identical to SEQ ID NO:363, wherein said polypeptide immunoreactswith seropositive serum of an individual infected with Streptococcuspneumoniae.
 2. An isolated polypeptide of claim 1, wherein saidpolypeptide comprises the amino acid sequence of SEQ ID NO:363.
 3. Anisolated polypeptide of claim 2, wherein said polypeptide consists ofthe amino acid sequence of SEQ ID NO:363.
 4. An immunogenic compositioncomprising a polypeptide of claim
 1. 5. An immunogenic composition ofclaim 4, further comprising one or more adjuvants.
 6. An isolatedpolynucleotide that encodes a polypeptide of claim
 1. 7. An isolatedpolynucleotide of claim 6, wherein the polynucleotide comprises anucleotide sequence at least 95% identical to a nucleotide sequence ofSEQ ID NO:148, or a degenerate variant thereof.
 8. An isolatedpolynucleotide of claim 7, wherein said polynucleotide comprises thenucleotide sequence of SEQ ID NO:148, or a degenerate variant thereof.9. An isolated polynucleotide which hybridizes to a nucleotide sequenceof SEQ ID NO:148, a complement thereof, or a degenerate variant thereof,under high stringency hybridization conditions.
 10. A recombinantexpression vector comprising a polynucleotide sequence of claim
 6. 11. Agenetically engineered host cell, transfected, transformed or infectedwith the vector of claim
 10. 12. A host cell of claim 11, wherein thehost cell is a bacterial cell and wherein the polynucleotide isexpressed to produce the encoded polypeptide.
 13. A method for producinga polypeptide which comprises culturing the genetically engineered hostcell of claim 11 under conditions suitable to produce the polypeptideand recovering the polypeptide from the culture.
 14. An antibodyspecific for a Streptococcus pneumoniae polypeptide of claim
 1. 15. Amethod for the detection and/or identification of Streptococcuspneumoniae in a biological sample comprising: (a) contacting the samplewith an oligonucleotide primer of a polynucleotide comprising thenucleotide sequence of SEQ ID NO: 148, or a degenerate variant thereof,in the presence of nucleotides and a polymerase enzyme under conditionspermitting primer extension; and (b) detecting the presence of primerextension products in the sample, wherein extension products indicatethe presence of Streptococcus pneumoniae in the sample.
 16. A method forthe detection and/or identification of antibodies to Streptococcuspneumoniae in a biological sample comprising: (a) contacting the samplewith an isolated polypeptide of claim 1 under conditions permittingimmune complex formation; and detecting the presence of an immunecomplex in the sample, wherein an immune complex indicates the presenceof Streptococcus pneumoniae in the sample.